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1. Formulation of the main problems 

To formulate the main problems discussed in this paper first I introduce some 
notations. Let us have a sequence of independent and identically distributed 
random variables Ci,..., Cn on a measurable space {X, X) with distribution /r, 
and introduce their empirical distribution 

= ~tt{j ■ ^ 1 < j < n}, A G X. (1) 

Given a measurable function f(xi,..., Xk) on the product space {X^,X^) let us 
consider the integral of this function with respect to the fc-fold direct product 
of the normalized version — /i) of the empirical measure i.e. take the 

integral 


JnAf) 



j f{xi,... ,Xk){lJ.n{dxi) - n{dxi)) ■ ■ ■ innidxk) - fl{dXk)), 


where the prime in J' means that the diagonals xj = x;, 

1 < j < I < k, are omitted from the domain of integration. (2) 
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I am interested in the following two problems: 

Problem a). Give a good estimate of the probabilities P{Jn,k{f) > u) under 
some appropriate (and not too restrictive) conditions on the function /. 

It seems to be natural to omit the diagonals Xj = xi, j /= I, from the domain 
of integration in the definition of the random integrals Jn,k{f) in In the 
applications I met the estimation of such a version of the integrals was needed. 
I shall also discuss the following more general problem: 

Problem b). Let / S .F be a nice class of functions on the space , X^). Give a 



where Jn,k{f) denotes 


again the random integral of a function / defined in ©• 

I met the problems formulated above when I tried to adapt the method of in¬ 
vestigation about the limit behaviour of maximum likelihood estimates to more 
difficult problems, to so-called non-parametric maximum likelihood estimates. 
An important step in the investigation of maximum likelihood estimates consists 
of a good approximation of the maximum-likelihood function whose root we are 
looking for. The Taylor expansion of this function yields a good approximation 
if its higher order terms are dropped. In an adaptation of this method to more 
complicated situations the solution of the above mentioned problems a) and b) 
appear in a natural way. They play a role similar to the estimation of the coef¬ 
ficients of the Taylor expansion in the study of maximum likelihood estimates. 
Here I do not discuss the details of this approach to non-parametric maximum- 
likelihood problems. The interested reader may find some further information 
about it in papers |23| and |211, where such a question is investigated in detail 
in a special case. 

In the above mentioned papers the so-called Kaplan-Meyer method is inves¬ 
tigated for the estimation of a distribution function by means of censored data. 
The solution of problem a) is needed to bound the error of the Kaplan-Meyer 
estimate for a single argument of the distribution function, and the solution of 
problem b) helps to bound the difference of this estimate and the real distri¬ 
bution function in the supremum norm. Let me remark that the approach in 
papers m and m seems to be applicable under much more general circum¬ 
stances, but this requires the solution of some hard problems. 

I do not know of other authors who dealt directly with the study of random 
integrals similar to that defined in ©■ On the other hand, several authors 
investigated the behaviour of [/-statistics, and discussed the next two problems 
that I describe under the name problem a') and problem b'). To formulate them 
first I recall the notion of [/-statistics. 

If a sequence of independent and identically distributed random variables 
^ 1 ,... is given on a measurable space (X, X) together with a function of k 
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variables /(xi,..., Xk) on the space (X^, n> k, then the expression 

In.kif) = f ■ ■ ■ Ajk) ■ (3) 

l<is<n,a = l. k 

3s¥^3^I if 

is called a [/-statistic of order k with kernel function /. Now I formulate the 
following two problems. 

Problem d). Give a good estimate of the probabilities P{n~^^^In,k{f) > u) 
under some appropriate (and not too restrictive) conditions on the function /. 


Problem b'). Let IF be a nice class of functions on the space {X^,X^). Give 
a good estimate of the probabilities P sup kif) > u where In k{f) 

V/6^ ’ / 

denotes again the U -statistic with kernel function / defined in 


Problems a) and b) are closely related to problems a') and b'), but the de¬ 
tailed description of their relation demands some hard work. The main difference 
between these two pairs of problems is that integration with respect to a power 
of the measure /r in formula II means some kind of normalization, while the 
definition of the [/-statistics in contains no normalization. Moreover, there 
is no simple way to introduce some good normalization in [/-statistics. This has 
the consequence that in problems a) and b) a good estimate can be given for a 
much larger class of functions than in problems a') and b'). Hence the original 
pair of problems seems to be more useful in several possible applications. 

Both the integrals Jn,k{f) defined in 0 and the [/-statistics In,k{f) defined 
in © are non-linear functionals of independent random variables, and the main 
difficulty arises in their study because of this non-linearity. On the other hand, 
the normalized empirical measure •\/n(Fn ~ f) is close to a Gaussian field for a 
large sample size n. Moreover, as we shall see, [/-statistics with a large sample 
size behave similarly to multiple Gaussian integrals. This suggests that the study 
of multiple Gaussian integrals may help a lot in the solution of our problems. 
To investigate them first I recall the definition of white noise that we shall need 
later. 


Definition of a white noise with some reference measure. Let us have a 
a-finite measure n on a measurable space {X,X). A white noise with reference 
measure p, is a Gaussian field piw = {l^w(A): A & X, pl{A) < oo}, i.e. a set of 
jointly Gaussian random variables indexed by the above sets A, which satisfies 
the relations Ep,w{A) = 0 and Efiw(rA)h-w(B) = n B). 

Remark: In the definition of a white noise one also mentions the property 
G B) = plw{A) + plw(B) with probability 1 if H n i? = 0, and < 
oo, fJ-{B) < oo. This could be omitted from the definition, because it fol¬ 
lows from the remaining properties of white noises. Indeed, simple calcula¬ 
tion shows that E{pwiA U B) — — h‘w{B)Y = 0 if H n H = 0, hence 
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— fj,wi^) — fJ-wiB) = 0 with probability 1 in this case. It also can be 
observed that if some sets Ai,... ,Ak G A, fi{Aj) < oo, 1 < j < k, are disjoint, 
then the random variables ^w{Aj), 1 < j < fc, are independent because of the 
uncorrelatedness of these jointly Gaussian random variables. 

It is not difficult to see that for an arbitrary reference measure /r on a space 
{X, X) a white noise ^.w with this reference measure really exists. This follows 
simply from Kolmogorov’s fundamental theorem, by which if the finite dimen¬ 
sional distributions of a random field are prescribed in a consistent way, then 
there exists a random field with these finite dimensional distributions. 

If a white noise with a cr-finite reference measure /r is given on some 
measurable space {X,X) together with a function f{xi,... ,Xk) on {X^,X^) 
such that 



then the multiple Wiener-Ito integral of the function / with respect to a white 
noise pvv with reference measure fi can be defined, (see e.g. [2] or m) It will 
be denoted by 



( 5 ) 


Here we shall not need a detailed discussion of Wiener-Ito integrals, it will be 
enough to recall the idea of their definition. 

Let us have a measurable space {X,X) together with a non-atomic ct- finite 
measure p on it. (Wiener-Ito integrals are defined only with respect to a white 
noise nw with a non-atomic reference measure p.) We call a function / on 
{X^, X^) elementary if there exists a finite partition Hi,..., Am, 1 < M < oo, 

M 

of the set X (i.e. Aj n Ajr = 0 if j ^ / and |J Aj = X) such that fJ,{Aj) < oo 

t=i 

for all I < j < M, and the function / satisfies the properties 

f{xi, ...,Xk)= c(ji,..., jfc) if xi G Aj^,... ,Xk G Aj^, 

l<js<M, I < s < fc, 

and c(ji,... ,jk) =0 if js = js' for some 1 < s < s' < fc (6) 

with some real numbers c(ji,..., jfe), 1 < Js < M, 1 < s < fc, i.e. the function 
/ is constant on all /c-dimensional rectangles x • • • x , and it equals zero 
on those rectangles which have two sides which agree. (More precisely, we allow 
the exception ^(Am) = oo, but in the case ^i{Am) = oo we demand in formula 
that c(ji ,... ,jk) = 0 if one of the arguments js, 1 < s < k equals M. In 
this case we omit from the sum in the next formula 0 those indices (ji,..., jk) 
for which one of the coordinates of this vector equals M.) 

The Wiener-Ito integral of the elementary function /(xi,..., Xk) in formula 
with respect to a white noise with the (non-atomic) reference measure 
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p is defined by the formula 


Zg,k{f) = J f{xi,... ,Xk)fJ.w{dxi)... pw{dxk) 


E 


c(ji, ■ ■ ■ijk)pw{Ah)’ ■ 


( 7 ) 


1<3s<M, l<s<k 


Then the definition of Wiener-Ito integral can be extended to a general func¬ 
tion satisfying relation by means of an L 2 -isomorphism. The details of this 
extension will be not discussed here. 

Let me remark that the condition c(ji, ... ,jk) = 0 if js = js' for some 1 < s < 
s' < k in the definition of an elementary functions can be interpreted so that, 
similarly to the definition of the random integral Jn,kif) in ©j the diagonals 
are omitted from the domain of integration of a Wiener-Ito integral Z^{f). 

The investigation of Wiener-Ito integrals is simpler than that of random 
integrals Jn,k{f) defined in (|21) or of U-statistics introduced in 0 because of 
the Gaussian property of the underlying white noise. Beside this, the study of 
Wiener-Ito integrals may help in understanding what kind of estimates can be 
expected in the solution of problems a) and b) or a') and b') and also in finding 
the proofs. Hence it is useful to consider the following two problems. 

Problem a"). Give a good estimate of the probabilities > u) under 

some appropriate (and not too restrictive) conditions on the function / and 
measure p. 

Problem b"). Let fF be a nice class of functions on the space Give a 

good estimate of the probabilities P sup Z^k{f) > it ) where Z^ kif) denotes 

again a Wiener-Ito integral with function / and white noise with reference 
measure p. 


In this paper the above problems will be discussed. Such estimates will be 
presented which depend on some basic characteristics of the random expressions 
Jn,k{f), In,kif) Or They will depend mainly on the L2 and Loo-norm 

of the function / taking part in the definition of the above quantities. (The L 2 - 
norm of the function / is closely related to the variance of the random variables 
we consider.) The proof of the estimates is related to some other problems 
interesting in themselves. My main goal was to explain the results and ideas 
behind them. I put emphasis on the explanation of the picture that can help 
understanding them, and the details of almost all proofs are omitted. A detailed 
explanation together with the proofs can be found in my Lecture Note m 
This paper consists of 9 sections. The first four sections contain the results 
about problems a), a') and a") together with some other statements which may 
explain better their background. Section^lcontains the main ideas of their proof. 
In Section ini problems b), b') and b") are discussed together with some related 
questions. The main ideas of the proofs of the results in Section|^which contain 
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many unpleasant technical details are discussed in Sect ions 0 and |H1 In Sectional 
Talagrand’s theory about concentration inequalities is considered together with 
some new results and open questions. 

2. The discussion of some large deviation results 

First we restrict our attention to problems a), a') and a"), i.e. to the case 
when the distribution of the random integral or [/-statistic of one function is 
estimated. These problems are much simpler in the special case k = 1. But they 
are not trivial even in this case. A discussion of some large deviation results 
may help to understand them better. I recall some large deviation results, but 
not in their most general form. Actually these results will not be needed later, 
they are interesting for the sake of some orientation. 

Theorem 2.1 (Large deviation theorem about partial sums of inde¬ 
pendent and identically distributed random variables). Let ^ 1 ,^ 2 , ■■■, 
be a sequence of independent and identieally distributed random variables such 
that E^i = 0, Ee*'^^ < 00 with some t > 0. Let us define the partial sums 

n 

Sn = n = 1, 2,.... Then the relation 

lim — log P{Sn > nu) = —p(u) for all u > 0 (8) 

n—>00 Tl 

holds with the function p(u) defined by the formula p{u) = sup {tu — logi/e*^^). 

The function p{-) in formula 0) has the following properties: p{u) > 0 for all 
u > 0, and it is a monotone inereasing function, there is some number 0 < A < 
00 with a number A depending on the distribution of the function such that 
p{u) < 00 for 0 < u < A, and the asymptotic relation p{u) = -I- 0{u^) holds 

for small u > 0, where cr^ = E^1 is the variance 0 /^ 1 . 

The above theorem states that for all e > 0 the inequality P{Sn > nu) < 
^-n(p(u)-e) if 71 > n{u,e), and this estimate is essentially sharp. Actually, 
in nice cases, when the equation p{u) = sup (tu — log ) has a solution 

in t, the above inequality also holds with e = 0 for all n > 1. The function 
p{u) in the exponent of the above large deviation estimate strongly depends 
on the distribution of ^ 1 . It is the so-called Legendre transform of logi/e*^F 
of the logarithm of the moment generating function of ^ 1 , and its values in 
an arbitrary interval determine the distribution of ^ 1 . On the other hand, the 
estimate for small u > 0 shows some resemblance to the bound suggested 
by the central limit theorem. Indeed, for small u > 0 it yields the upper bound 
g-ncr u /2+nO{u Central limit theorem would suggest the estimate 

g-ncr u /2_ |;fjg standard normal distribution function $(u) 

satisfies the inequality (i — ;^) ^ < I — <I>(u) < ^for all u > 0, 
hence for large u it is natural to bound it by 
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The next result I mention, Bernstein’s inequality, (see e.g. [H], 1.3.2 Bern¬ 
stein’s inequality) has a closer relation to the problems discussed in this paper. 
It gives a good upper bound on the distribution of sums of independent, bounded 
random variables with expectation zero. It is important that this estimate is uni¬ 
versal, the constants it contains do not depend on the properties of the random 
variables we consider. 

Theorem 2.2 (Bernstein’s inequality). Let Xi,..., Xn be independent ran¬ 
dom variables, P{\Xj\ < 1) = 1, EXj = 0, 1 < j < n. Put aj = EX^, 

n n 

^ ^ j ^ n, Sn = = VarSn = X] Then 

1=1 


P{Sn > u) < exp 



for all u > 0. 


(9) 


Let us take a closer look on the content of Theorem o Estimate m yields 
a bound of different form if the first term is dominating in the sum 1 -|- 
in the denominator of the fraction in this expression and if the second term is 
dominating in it. If we fix some constant C > 0, then formula 0 yields that 
P{Sn > u) < with some constant B = B{C) for 0 < rt < CV^- If, 

moreover 0 < u < eV^ with some small £ > 0, then the estimate P{Sn > u) < 
^-{i-Ke)u /214 holds with a universal constant K > 0. This means that in the 
case 0 < w < the tail behaviour of the distribution of F{u) = P{Sn > u) 
can be bounded by the distribution G{u) = P(const. 14, 1 ? > u) where 77 is a 
standard normal random variable, and is the variance of the partial sum 
S'„. If 0 < M < eVjf with a small e > 0, then it also can be bounded by 
P((l — Ke))VnT] > u) with some universal constant K > 0. 

In the case u formula © yields a different type of estimate. In this 

case we get that P(5'„ > u) < with a small e > 0, and this seems to 

be a rather weak estimate. In particular, it does not depend on the variance 
of Sn- In the degenerate case 14 = 0 when P{Sn > u) = 0, estimate (0 yields 
a strictly positive upper bound for P{Sn > u). One would like to get such an 
improvement of Bernstein’s inequality which gives a better bound in the case 
Bennett’s inequality (see e.g. [2H|j Appendix B, 4 Bennett’s inequality) 
satisfies this requirement. 

Theorem 2.3 (Bennett’s inequality). Let Xi,..., Xn be independent ran¬ 
dom variables, P{\Xj\ < 1) = 1, EXj = 0, 1 < j < n. Put aj = EXj, 

n n 

^ ^ j n, Sn = X = VarSn = X Then 

i=i i=i 


- 4 ^ 


1 + 


42 


log 1 -b 


42 


u 


P{Sn > u) < exp 


for all u > 0. 

( 10 ) 
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As a consequence, for all e > 0 there exists some B = B{e) >0 such that 

P(S'„ >m) < exp |-(l-e)Mlog-^| ifu>BV^, (11) 

and there exists some positive constant K > 0 such that 

P(S'„ > u) < expj-PTMlog-^l ifu>2V^. (12) 

Estimates CD or CD yield a slight improvement of Bernstein’s inequality 
in the case u > KVff with a sufficiently large K > 0. On the other hand, 
even this estimate is much weaker than the estimate suggested by a formal 
application of the central limit theorem. The question arises whether they are 
sharp or can be improved. The next example shows that inequalities CD or CD 
in Bennett’s inequality are essentially sharp. If no additional restrictions are 
imposed, then at most the universal constants can be improved in them. Even a 
sum of independent, bounded and identically distributed random variables can 
be constructed which satishes a lower bound similar to the upper bounds in 
formulas CD and CD> only with possibly different constants. 

Example 2.4. Let us fix some positive integer n, real numbers u and cP such 
that 0<a^<^,n>3u>6 and u > dncr^. Put = na^ and take a sequence 
of independent, identically distributed random variables Xi,...,Xn such that 

2 

P{Xj = 1) = P{Xj = —1) = and P{Xj = 0) = 1 — . Put Sn = ^j- 

i=i 

Then ESn = 0 , VarSn = Vjf, and 

P{Sn >u)> exp|-Pulog-^| 
with some appropriate (universal) constant B > 0. 

Remark: The estimate of Example 12.41 or of relations CD and is well com¬ 
parable with the tail distribution of a Poisson distributed random variable with 
parameter A = const, ncr^ > 1 at level u > 2A. Some calculation shows that 
a Poisson distributed random variable (x with parameter A > 1 satishes the 
inequality g-Ciuiog{u/\) ^ P(((x — E((x > u) < P{C\ > u) < -P(Ca — E((x > 
|) < e“‘" 2 «iog(u/A) some appropriate constants 0 < Ci < C 2 < oo for all 

u > 2A, and E(x = Var(C;v = A. This estimate is similar to the above mentioned 
relations. 

Example El is proved in Example 3.2 of my Lecture Note [22, but here I 
present a simpler proof. 

Proof of the statement of Example \2.4\ Let us hx an integer u such that n > 3u 
and u > Ana^. Let B = B (u) denote the event that among the random variables 
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^ j ^ there are exactly 3 m terras with values +1 or —1, and all other 
random variables Xj equal zero. Let us also define the event A = A{u) C B{u) 
which holds if 2m random variables Xj are equal to 1, m random variables Xj 
are equal to —1, and all remaining random variables Xj, 1 < j < n, are equal to 
zero. Clearly, P{Sn > m) > P{A) = P{B)P{A\B). On the other hand, P{B) = 

= e-3“l°g(3V"-0-4n.^ Here 

we exploited that because of the condition cr^ < i we have 1 — cr^ > 

Beside this, u > 4ncr^ and P{B) > e-3«iog(3»x/na0-t^ > g-Biuiog(u/naO 
some appropriate i?i > 0 under our assumptions. 

Let us consider a set of 3m elements, and choose a random subset of it by 
taking all elements of this set with probability 1/2 to this random subset inde¬ 
pendently of each other. I claim that the conditional probability P{A\B) equals 
the probability that this random subset has 2m elements. Indeed, even the con¬ 
ditional probability of the event A under the condition that for a prescribed set 
of indices J C {1,..., n} with exactly 3m elements we have Xj = ±1 if j S J 
and Xj = 0 if / ^ J equals the probability of the event that the above defined 
random subset has 2m elements. This is so, because under this condition the 
random variables Xj take the value -1-1 with probability 1/2 for all j & J inde¬ 
pendently of each other. Hence P(,A\B) = 

with some appropriate constants C > 0 and i?2 > 0 under our conditions, since 
> 4 in this case. The estimates given for P{B) and P{A\B) imply the 
statement of Example El 

Bernstein’s inequality provides a solution to problems a) and a') in the case 
k = 1 under some conditions. Because of the normalization (multiplication 
by in these problems) it yields an estimate with the choice u = ^/nu. 

n 

Observe that Jn,i{f) = ~ for fc = 1 in the definition (|21). 

1=1 

In problem a) it gives a good bound on P{Jn,i{f) > u) for a function / such 
that |/(a;)| < ^ for all x G X with the choice Xj = f{^j) — Ef{^j), 1 < j < n, 
and M = -v/mm. In problem a') it gives a good bound on > u) 

under the condition |/(x)| < 1 for all x £ X, and = 0 with the choice 

Xj = f{^j), 1 < J < M, and u = ^Jnu. This means that in the case 0 < m < 
C^/na'^ the bounds P{Jn,i{f) > u) < and P{n~^^'^In,i{f) > u) < 

/2cr ^2 _ Var/(^i) and some constant K — K{C) depending 

on the number C if the above conditions are imposed in problem a) or a'). If 
0 < M < ey/na^ with some small e > 0, then the above constant K can be 
chosen very close to the number 1. 

The above results can be interpreted so that in the case 0 < m < const, y/na'^ 
and a bounded function / an estimate suggested by the central limit theorem 
holds for problem a), only an additional constant multiplier may appear in the 
exponent. A similar statement holds in problem a'), only here the additional 
condition Ef{^j) = 0 has to be imposed. On the other hand, the situation is 
quite different if m y/na^. In this case Bernstein’s inequality yields only a very 
weak estimate. Bennett’s inequality gives a slight improvement. It yields the 
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inequality P{Jn,i{f) > u) < e Bu^\og(u/^P) appropriate constant 

i? > 0 if |/(a;)| < i for al\x G X,u> 2y/na‘^, and cr^ = Var/(^i). The estimate 
P{n~^/^In,i{f) > u) < holds with an appropriate i? > 0 if 

|/(a:)| < 1 for all x G X, Ef{^i) = 0, Var/(^i) = tr^, and u > 2^/ria'^. These 
estimates are much weaker than the bound suggested by a formal application of 
the central limit theorem. On the other hand, as Example in shows, no better 
estimate can be expected in this case. Moreover, the proof of this example gives 
some insight why a different type of estimate appears in the cases u < ^/na^ 
and u » \/na^ for problems a) and a'). 

In the proof of Example mi a ‘bad’ irregular event A was defined such that 
if it holds, then the sum of the random variables considered in this example is 
sufficiently large. Generally, the probability of such an event is very small, but 
if the variance of the random variables is very small, (in problems a) and a') 
this is the case if <C urT^!'^') then such ‘bad’ irregular events can be defined 
whose probabilities are not negligible. 

Problems a) and a') will also be considered for fc > 2, and this will be 
called the multivariate case. The results we get for the solution of problems a) 
and a') in the multivariate case is very similar to the results described above. 
To understand them first some problems have to be discussed. In particular, the 
answer for the following two questions has to be understood: 

Question a). In the solution of problem a!) in the case k = 1 the condition 
= 0 was imposed, and this means some kind of normalization. What 
condition corresponds to it in the multivariate case? This question leads to the 
definition of degenerate [/-statistics and to the so-called Hoeffding’s decompo¬ 
sition of [/-statistics to a sum of degenerate [/-statistics. 

Question b). The discussion of problems a) and a') was based on the central limit 
theorem. What kind of limit theorems can take its place in the multivariate case? 
What kind of limit theorems hold for [/-statistics In,k{f) or multiple random 
integrals Jn,k{f) defined in |(2l? The limit appearing in these problems can be 
expressed by means of multiple Wiener-Ito integrals in a natural way. 

In the next section the two above questions will be discussed. 


3. On some problems about [7-statistics and random integrals 
3.1. The normalization of U-statistics 

In the case k = 1 problem a') means the estimation of sums of independent 
and identically distributed random variables. In this case a good estimate was 
obtained under the condition Ef(fi) = 0. 

In the multivariate case fc > 2 a stronger normalization property has to 
be imposed to get good estimates about the distribution of [/-statistics. In 
this case it has to be assumed that the conditional expectations of the terms 
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fiijit ■ ■ ■ j^jk) of the [/-statistic under the condition that the value of all but 
one arguments takes a prescribed value equals zero. This property is formulated 
in a more explicit way in the following definition of degenerate [/-statistics. 

Definition of degenerate [/-statistics. Let us consider the U -statistic In,kif) 
of order k defined in formula m with kernel function /(xi,... ,Xfe) and a se¬ 
quence of independent and identically distributed random variables It 

is a degenerate U -statistic if its kernel function satisfies the relation 

nf (Cl! ■ ■ ■ jCfcICi ~ 2^1,...,Cj—i ~ ^j—1} fj+1 ~ ■■■ 1 — Xfc) = 0 

for all 1 < j < k and Xg S X, s € {1 ,... ,k} \ {j}. (13) 


The definition of degenerate [/-statistics is closely related to the notion of 
canonical functions described below. 

Definition of canonical functions. A function f{xi,... ,Xk) taking values 
in the k-fold product of a measurable space (X,/f) is called canonical 

with respect to a probability measure ji on (X, X) if 

j /(xi,.. . ,Xj-i,u,Xj+i,.. .,Xk)ti-idu) = 0 

for all 1 < j < k and Xg G X, s G {1, ... ,k} \ {j}. (14) 


It is clear that a [/-statistic In,k{f) is degenerate if and only if its kernel 
function / is canonical with respect to the distribution /r of the random variables 
Cl,... ,Cn appearing in the definition of the [/-statistic. 

Given a function / and a probability measure fi, this function can be written 
as a sum of canonical functions (with different sets of arguments) with respect 
to the measure /i, and this enables us to decompose a [/-statistic as a linear 
combination of degenerate [/-statistics. This is the content of Hoeffding’s de¬ 
composition of [/-statistics described below. To formulate it first I introduce 
some notations. 

Consider the fc-fold product {X^,X^,pf) of a measure space (X, X,/r) with 
some probability measure /i, and define for all integrable functions /(xi,..., Xk) 
and indices 1 < j < fc the projection Pjf of the function / to its j-th coordinate 
as 


Pjf{xi,...,Xj-i,Xj+i,...,Xk)= J f{xi,...,xk)ti{dxj), l<j<k. (15) 

In some investigations it may be useful to rewrite formula CSl by means of 
conditional expectations in an equivalent form as 


Pj / (^ 1 5 ■ ■ ■ 5 — 1 5 +1 ) ■ ■ ■ 5 ) 

— P{f (“Cl 5 ■ ■ • 7 'f/c) 1^1 — • J ^j — 1 — —1 7 Cj + 1 — 7 
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where are independent random variables with distribution /i. 

Let us also define the operators Qj = I — Pj as Qjf = f — Pjf on the space 
of integrable functions on 1 < j < fc. In the definition 11511 Pjf 

is a function not depending on the coordinate Xj, but in the definition of Qj 
we introduce the fictive coordinate Xj to make the expression Qjf = f — Pjf 
meaningful. The following result holds. 

Theorem 3.1 (Hoeffding’s decomposition of [/-statistics). Let an inte¬ 
grable function f{xi ,..., xQ be given on the k-fold product space {X^^ ) 

of a space {X, X, pt) with a probability measure fi. It has the decomposition 

f = X! 

fvixj, j ev) = n Qj\ f{xi,...,xk) (16) 
\jG{i,...,k}\v jev J 

such that all functions fv, V C {1,..., k}, in 1/61) are canonical with respect to 
the probability measure pt, and they depend on the \V\ arguments Xj, j G V. 

Let fi,... ,fn be a sequence of independent, pt distributed random variables, 
and consider the U-statistics In,k{f) and In,\v\ifv) corresponding to the kernel 
functions f, fv defined in and random variables Then 

Ir.Af)= E in-\V\){n-\V\-l)---in-k + l)^-^I,,jv\{fv) (17) 

Vcz{l,...,k} 

is a representation of In,k{f) as a sum of degenerate U-statistics, where \V\ 
denotes the cardinality of the set V. (The product (n— \V\){n — \V\ — 1) • • • (ri¬ 
fe + 1) is defined as 1 if V = {l,...,fe}, i.e. \V\ = k.) This representation is 
called the Hoeffding decomposition of InAf)- 

Hoeffding’s decomposition was originally proved in paper m It may be 
interesting also to mention its generalization in |32| . 

I omit the proof of Theorem 13.11 although it is fairly simple. I only try to 
briefly explain that the construction of Hoeffding’s decomposition is natural. 
Let me recall that a random variable can be decomposed as a sum of a random 
variable with expectation zero plus a constant, and the random variable with 
expectation zero in this decomposition is defined by taking out from the original 
random variable its expectation. To introduce such a transformation which turns 
to zero not only the expectation of the transformed random variable, but also 
its conditional expectation with respect to some condition it is natural to take 
out from the original random variable its conditional expectation. Since the 
operators Pj defined in GB are closely related to the conditional expectations 
appearing in the definition of degenerate [/-statistics, the above consideration 

k 

makes natural to write the identity / = n (Pj + <5j)/ = E fv with the 

1=1 V'C{l....,fc} 
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functions defined in HI fi|l . (In the justification of the last formula some properties 
of the operators Pj and Qj have to be exploited.) 

It is clear that Eln^kif) = 0 for a degenerate [/-statistic. Also the inequality 

witha'^ = J f{xi,...,xk)p{dxi)...p{dxk) (18) 

holds if In,kif) is a degenerate [/-statistic. The measure p in (ITHll is the distribu¬ 
tion of the random variables taking part in the definition of the [/-statistic. 
Moreover, lim n~^E {In^kif))^ — TT the kernel function / is a symmetric 

function of its arguments, i.e. f{xi,... ,Xk) = /(a: 7 r(i), . ■ • ,XTr(k)) for all permu¬ 
tations TT = (7r(l),..., 7r(fc)) of the set {!,..., k}. 

Relation m can be proved by means of the observation that 

if {ji, ■ • ■, jfe} ^ {j'l, ..., j(.}, and / is a canonical function with respect to the 
distribution fi of the random variables . On the other hand, 

, • ■ •, Cj' )| < J fixi,..., Xk)p{ dxi) ...p{ dxk) 

by the Schwarz inequality if {ji, ■ ■ ■, jk} = {j'n ■ ■ ■ t j'k\^ i-®- if fii® sequence 
of indices . •., is a permutation of the sequence of indices ji,..., and 
there is an identity in this relation if the function / is symmetric. The last 
formula enables us to check the asymptotic relation given for E {In,k{f))^ after 
relation m- 

Relation OH suggests to restrict our attention in the investigation of problem 
a') to degenerate [/-statistics, and it also explains why the normalization 
was chosen in it. For degenerate [/-statistics with this normalization such an up¬ 
per bound can be expected in problem a') which does not depend on the sample 
size n. The estimation of the distribution of a general [/-statistic can be reduced 
to the degenerate case by means of Hoeffding’s decomposition [Theorem I, 'I. 111 . 

The random integrals Jn,k{f) are defined in |2J) by means of integration with 
respect to the signed measure jin — P, and this means some sort of normal¬ 
ization. This normalization has the consequence that the distributions of these 
integrals satisfy a good estimation for rather general kernel functions /. Be¬ 
side this, a random integral Jn,k{f) can be written as a sum of [/-statistics to 
which the Hoeffding decomposition can be applied. Hence it can be rewritten 
as a linear combination of degenerate [/-statistics. In the next result I describe 
the representation of Jn,k{f) we get in such a way. It shows that the implicit 
normalization caused by integration with respect to pn — p has a serious can¬ 
cellation effect. This enables us to get a good solution for problem a) or b) if 
we have a good solution for problem a') or b'). Unfortunately, the proof of this 
result demands rather unpleasant calculations. Hence here I omit the proof. It 
can be found in m or in Theorem 9.4 of m 
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Theorem 3.2. Let us have a non-atomie measure p on a measurable space 
{X, X) together with a sequence of independent, pi-distributed random variables 
fi,... ,fn, and take a function f{xi,... ,Xk) ofk variables on the space {X^,X^) 
such that 

J f'^{xi,..., Xk)p{ dxi) ...pi{ dxk) < oo. 

Let us consider the empirical distribution function /i„ of the sequence fi,... ,f,n 
introduced in m together with the k-fold random integral Jn,k{f) of the function 
f defined in W- The identity 

JuAf)= E C{n,k,V)n-'^'/Ar,jv\{fv), (19) 


holds with the canonical (with respect to the measure pi) functions fv(xj, j G V) 
defined in CB and appropriate real numbers C(n, k, V), V C {1,..., k}, where 
In,\v\(.fv) is the (degenerate) U-statistic with kernel function fv and random 
sequence defined in 0). The constants C(n,k,V) in satisfy the 

relations \C{n,k,V)\ < C{k) with some constant C{k) depending only on the 
order k of the integral Jn.kif), lim C{n,k,V) = C{k,V) with some constant 

C{k,V) < oo for all V C and C{n,k,{l,... ,k}) = 1 for V = 

{l,...,k}. 

Let us also remark that the functions fv defined in oni satisfy the inequalities 

y j e T) < J A{xi,... ,Xk)fJ.{dxi)... p,{dxk) (20) 

j^V 

and 

sup \fv{xj, j GV)\ <2^^^ sup |/(a;i,...,a;fe)| (21) 

for all y C {1,..., k}. 

The decomposition of the random integral Jn,k{f) in formula 1191) is similar to 
the Hoeffding decomposition of general U-statistics presented in Theorem lO 
The main difference between them is that the coefficients of the normalized 
degenerate U-statistics n~^^^^‘^Injv\(.fv) at the right-hand side of formula 
can be bounded by a universal constant depending neither on the sample size 
n, nor on the kernel function / of the random integral. This fact has important 
consequences. 

Theorem 13 . 21 enables us to get good estimates for problem a) if we have such 
estimates for problem a'). In particular, formulas (P|l . (P|l and yield good 
bounds on the expectation and variance of the random integral Jn,kif)- Tbe 
inequalities 


E {Jn,k{f)f < CA and \EJn,k{f)\<Ca, 

with = y ... ,Xk)p{dxi)... pi{dxk) (22) 
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hold with some universal constant C > 0 depending only on the order of the 
random integral Jn,k{f)- 

Relation yields such an estimate for the second moment of Jn,k{f) as 
we expect. On the other hand, although it gives a sufficiently good bound on 
its first moment, it does not state that the expectation of Jn,k{f) equals zero. 
Indeed, formula (HU only gives that \EJn^k{f)\ = 10(71, fc, 0)/0| < 01/01 = 
O 1/ /(xi,..., Xk)p{ dxi) ■ ■ ■ dxk) \ < Ca with some appropriate constant O > 
0. The following example shows that EJn,k{f) need not be always zero. (To 
understand better why such a situation may appear observe that the random 
measures {pn — p){Bi) and (yLt„ —/r)(i?2) are not independent for disjoint sets Bi 
and B 2 .) 

Let us consider a random integral Jn, 2 if) of order 2 with an appropriate 
kernel function /. Beside this, choose a sequence of independent random vari¬ 
ables with uniform distribution on the unit interval [0,1] and de¬ 

note its empirical distribution by /i„. We shall consider the example where the 
kernel function / = f{x,y) is the indicator function of the unit square, i.e. 
f{x,y) = 1 if 0 < x,y < 1, and f{x,y) — 0 otherwise. The random integral 
JnAf) = ^ L^yfi^^y)(Pnidx) - dx){tJLri{dy) - dy) will be taken, and its ex¬ 
pected value EJn, 2 {f) will be calculated. By adjusting the diagonal x = y to the 
domain of integration and taking out the contribution obtained in this way we 
get that EJn^ 2 {f) = dx) — dx))'^ — = —1, i-e. the expected 

value of this random integral is not equal to zero. (The last term is the inte¬ 
gral of the function f{x,y) on the diagonal x = y with respect to the product 
measure /j,„ x /i„ which equals (/i„ — yt) x (ytn — p) on the diagonal.) 

Now I turn to the second problem discussed in this section. 


3.2. Limit theorems for U-statistics and random integrals 

The following limit theorem about normalized degenerate [/-statistics will be 
interesting for us. 

Theorem 3.3 (Limit theorem about normalized degenerate [/-statis¬ 
tics). Let us consider a sequence of degenerate U-statistics In,kif) of order k, 
n = k,k -\-1,..., defined in m with the help of a kernel function /(xi,..., Xk) 
on the k-fold product of a measurable space {X,X), canonical with 

respect to some non-atomic probability measure yt on (X,X) and such that 
f /^(xi,..., x/c)/i( dxi)... /i(dxfc) < oo together with a sequence of independent 
and identically distributed random variables ^1,^2 ,-.. with distribution y on 
{X,X). The sequence of normalized U-statistics n~^^^In,k{f) converges in dis¬ 
tribution, as n —> 00 , to the k-fold Wiener-ltd integral 



with kernel function /(xi,...,Xfc) and a white noise yw with reference mea¬ 
sure y. 
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The proof of Theorem ESI can be found for instance in [S]. Here I present a 
heuristic explanation which can be considered as a sketch of proof. 

To understand Theorem lH.dl it is useful to rewrite the normalized degenerate 
U -statistics considered in it in the form of multiple random integrals with respect 
to a normalized empirical measure. The identity 

J ..., Xk)pnidxi) ...pn{dxk) (23) 

= J ■ ■ ,Xk){pn{dxi) - p{dxi ))... ipn{dxk) - p{dxi)) 

holds, where is the empirical distribution function of the sequence ..., 

defined in o, and the prime in f denotes that the diagonals, i.e. the points 
X = (xi,..., Xfc) such that Xj = Xj> for some pairs of indices 1 < j,j' < k, j j' 
are omitted from the domain of integration. The last identity of formula 12211 
holds, because in the case of a function /(xi,...,Xfc) canonical with respect 
to a non-atomic measure p we get the same result by integrating with respect 
to pn{dxj) and with respect to pni dxj) — p{dxj). (The non-atomic property 
of the measure p is needed to guarantee that the integrals with respect to the 
measure pt considered in this formula remain zero if the diagonals are omitted 
from the domain of integration.) 

Formula (PI may help to understand Theorem E21 because the random 
fields — p{A)), A G X, converge to a Gaussian field J^(H), A G X, 

as n —> oo, and this suggests a limit similar to the result of Theorem 13.31 But 
it is not so simple to carry out a limiting procedure leading to the proof of 
Theorem E3 with the help of formula 122|. Some problems arise, because the 
fields — /r) converge to a not white noise type Gaussian field. The limit 

we get is similar to a Wiener bridge on the real line. Hence a relation between 
Wiener processes and Wiener bridges suggests to write the following version of 
formula (051) . Let r] he a standard Gaussian random variable, independent of 
the random sequence ^i, ^2) • ■ • • We can write, by exploiting again the canonical 
property of the function /, the identity 

J fixi,- ■ . ,Xk){pnidxi) - p{dxi) rip{dxi)) 

... {pLn{dxk) - dxk) + r]pi{ dxk)). (24) 

The random measures — p rjii) converge to a white noise with refer¬ 

ence measure /i, hence a limiting procedure in formula (1^ yields Theorem 13.31 
Moreover, in the case of elementary functions / the central limit theorem and 
formula (I24II imply the statement of Theorem 13.31 directly. (Elementary func¬ 
tions are defined in formula ©•) After this, Theorem IQ can be proved in the 
general case with the help of the investigation of the L2-contraction property of 
some operators. I omit the details. 

A similar limit theorem holds for random integrals Jn,k{f)- It can be proved 
by means of Theorem 13.21 and an adaptation of the above sketched argument 
for the proof of Theorem E21 It states the following result. 
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Theorem 3.4. Limit theorem about multiple random integrals Jn,k{f)- 
Let us have a sequence of independent and identically distributed random vari¬ 
ables with some non-atomic distribution p. on a measurable space 

(X, T) and a function f{xi ,..., Xk) on the k-fold product , X^) of the space 
{X, X) such that 

J , Xk)p{ dxi) ...p{ dxk) < OO. 

Let us consider for all n = 1,2,... the random integrals Jn,kif) of order k 
defined in formulas m and m with the help of the empirical distribution pn 
of the sequence and the function f. The random integrals Jn,kif) 

converge in distribution, as n ^ oo, to the following sum U(f) of multiple 
Wiener-ltd integrals: 

= E 

VC{l,....fc} ' 

E J e n l^widxj), 

vc{i,...,k} ' jev 

where the functions fv(xj j S V), V C are those functions defined 

in formula UB which appear in the Hoeffding decomposition of the function 
f{xi,..., Xk), the constants C(k, V) are the limits appearing in the limit relation 
lim C(n, k, V) = C{k, V) satisfied by the quantities Ciri, k, V) in formula UfAl . 

n—*oo 

and pw is a white noise with reference measure p. 

The results of this section suggest that to understand what kind of results can 
be expected for the solution of problems a) and a') it is useful to study first their 
simpler counterpart, problem a") about multiple Wiener-Ito integrals. They also 
show that problem a') is interesting in the case when degenerate [/-statistics are 
investigated. The next section contains some results about these problems. 


4. Estimates on the distribution of random integrals and U-statistics 


First I formulate the results about the solution of problem a"), about the tail- 
behaviour of multiple Wiener-Ito integrals. 

Theorem 4.1. Let us consider a a-finite measure p on a measurable space 
{X,X) together with a white noise pw with reference measure p. Let us have a 
real-valued function f{xi,..., Xk) on the space which satisfies relation 

OF with some cr^ < oo. Take the random integral Z^ k{f) introduced in formula 
EF- It satisfies the inequality 


P{\Z^^k{f)\ > u)< Cexp 



for all u > 0 


(25) 


with an appropriate constant C = C{k) > 0 depending only on the multiplicity 
k of the integral. 
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The proof of Theorem O can be found in my paper m together with the 
following example which shows that it gives a sharp estimate. 

Example 4.2. Let us have a a-finite measure p on some measurable space 
(X^X) together with a white noise pw on {X,X) with reference measure p. Let 
fo{x) be a real valued function on (X,X) such that f fo^x)^ p{dx) = 1, and take 
the function /(xi,..., Xk) = o'fo{xi) • • ■ fo{xk) with some number tr > 0 and the 
Wiener-ltd integral Zf^^kif) introduced in formula Then the relation 

/ f{xi,Xkf p{dxi)... p{ dxk) = cr^ 


holds, and the Wiener-ltd integral Z^^^kif) satisfies the inequality 


P{\Z^,k{f)\>u)> 



for all u > 0 


( 26 ) 


with some constant C > 0. 

Let us also remark that a Wiener-Ito integral Z^_fc(/) defined in 0 with a 
kernel function / satisfying relation 0 also satisfies the relations EZ^^k{f ) = 0 
and EZfi^kifY ^ with the number in 0). If the function / is symmet¬ 
ric, i.e. if f(xi,... ,Xk) = /(a;,r(i)i ■ ■ • TX^(^k)) for all permutations tt of the set 
{ 1 ,..., /c}, then in the last relation identity can be written instead of inequality. 
Beside this, Z^k{f) = ■^/i,fe(Sym/), where Sym/ denotes the symmetrization 
of the function /, and this means that we can restrict our attention to the 
Wiener-Ito integrals of symmetric functions without violating the generality. 
Hence Theorem oi can be interpreted in the following way. The random in¬ 
tegral Z/^^kif) has expectation zero, its variance is less than or equal to kla^ 
under the conditions of this result, and there is identity in this relation if / is 
a symmetric function. Beside this, the distribution of Z^^k{f) satisfies an esti¬ 
mate similar to that of cnj^, where 77 is a standard normal random variable. The 
estimate (I25II in Theorem 01 is not always sharp, but Example 14.21 shows that 
there are cases when the expression in its exponent cannot be improved. 

Let me also remark that the above statement can be formulated in a slightly 
nicer form if the distribution of Zfj^ kif) is compared not with that of crij^, but 
with that of aHkfrf), where L[k{x) is the fc-th Hermite polynomial with leading 
coefficient 1. The identities EHk{T]) = 0, EElkirffi = k\ hold. This means that 
not only the tail distributions of and aHk{r]) are similar, but in the case 

of a symmetric function / also their first two moments agree. 

In problems a) and a') a slightly weaker but similar estimate holds. In the 
case of problem a') the following result is valid (see EDI). 

Theorem 4.3. Let ^ 1 ,..., ^„ be a sequence of independent and identically dis¬ 
tributed random variables on a space {X,X) with some distribution p. Let us 
consider a function f{xi ,..., Xk) on the space {X^,X^), canonical with respect 
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to the measure p whieh satisfies the conditions 


ll/lloo = sup \f{Xi,...,Xk)\<l, 

Xj£X^ 

II/II 2 = J f{xi,...,xk)pidxi)...fj.{dxk)<cr‘^, 


(27) 

(28) 


with some 0 < < 1 together with the degenerate U -statistic In,k{f) defined in 

formula m with this kernel function f. There exist some constants A = A(k) > 
0 and B — B{k) > 0 depending only on the order k of the U -statistic In,k{f) 
such that 


P{kln '"/^\In^k{f)\ > u) < Aexp 
for allO<u< 


,2/k 


n + i? {un~^l'^a~ 



(29) 


Remark: Actually, the universal constant B > 0 can be chosen independently of 
the order k of the degenerate [/-statistic In,k{f) in inequality it^ . 

Theorem 01 can be considered as a generalization of Bernstein’s inequality 
Theorem l2.2l to the multivariate case in a slightly weaker form when only the sum 
of independent and identically distributed random variables is considered. Its 
statement, inequality 123 does not contain an explicit value for the constants A 
and B, which are equal to A = 2 and i? = | in the case of Bernstein’s inequality. 
(The constant A = 2 appears, because of the absolute value in the probability 
at the left-hand side of (123.) There is a formal difference between formula m 
and the statement of formula 123 in the case k = 1, because in formula ll^ the 
[/-statistic In,k{f) of order k is multiplied by . Another difference between 
them is that inequality 123 in Theorem l4.dl is stated under the condition 0 < it < 
and this restriction has no counterpart in Bernstein’s inequality. But, 
as I shall show. Theorem 01 also contains an estimate for u > in an 

implicit way, and it can be considered as the multivariate version of Bernstein’s 
inequality. 

Bernstein’s inequality gives a good estimate only if 0 < it < K^na^ with 
some K > 0 (with the normalization of Theorem 14.81 i.e. if the probability 

P ( > It 

V fc=i 

is considered). In the multivariate case a similar picture appears. We get a 
good estimate for problem a') suggested by Theorem 14.1 1 only under the con¬ 
dition 0 < u < const, If 0 < it < with a sufficiently small 

£ > 0, then Theorem 01 implies the inequality P{kln ^/^|/n.fe(/)| > u) < 
A exp | —with some universal constants A > 0 and C > 0 de¬ 
pending only on the order k of the [/-statistic In,k{f)- This means that in this 
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case Theorem 14.,HI yields an almost as good estimate as Theorem 14.1 1 a bout the 
distribution of multiple Wiener-Ito integrals. We have seen that Bernstein’s in¬ 
equality has a similar property if the estimate m is compared with the central 
limit theorem in the case 0 < it < with a small e > 0. 

To see what kind of estimate Theorem 14.31 yields in the case u > 
let us observe that in condition (P|l we have an inequality and not an iden¬ 
tity. Hence in the case > u > relation (I29II holds with a = 

and this yields that 

(The inequality > u was imposed to satisfy the condition 0 < tr^ < 1.) If 
u > then the probability at the left-hand side of equals zero because 
of condition lHI). It is not difficult to see by means of the above calculation that 
Theorem 14.dl implies the inequality 

(30) 

[ 1 

< Cl exp <- 7 -> for all w > 0 

I (T^/fc -I- C3 ^ J 

with some universal constants ci, C2 and C3 depending only on the order k of 
the P-statistic In,k{f), if the conditions of Theorem 14.31 hold. Inequality ll^ 
holds for all u > 0. Arcones and Gine formulated and proved this estimate in 
a slightly different but equivalent form in paper |3] under the name generalized 
Bernstein’s inequality. This result is weaker than Theorem 14.dl since it does not 
give a good value for the constant C2. The method of paper [^1 is based on a 
symmetrization argument. Symmetrization arguments can be very useful in the 
study of problems b) and b') formulated in the Introduction, but they cannot 
supply a proof of Theorem 14.dl with good constants because of some principal 
reasons. 

The following result which can be considered as a solution of problem a) 
is a fairly simple consequence of Theorem 01 Theorem 1221 and formulas 
and 1121 |l . 

Theorem 4.4. Let us take a sequence of independent and identically distributed 
random variables on a measurable space {X,X) with a non-atomic 

distribution /i on it together with a measurable function f(xi,... ,Xk) on the 
k-fold product of the space {X,X) with some k > 1 which satisfies 

conditions and with some constant 0 < cr < I. Then there exist some 
constants C = Ck > 0 and a = ak > 0 such that the random integral Jn,k{f) 
defined by formulas m and m with this sequence of random variables ^ 1 ,..., Cn 
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and function f satisfies the inequality 


P {\Jn,k{f)\ >u)< C'exp 



for alio 


(31) 


Theorein l4.4l provides a slightly weaker estimate on the probability considered 
in Problem a) than Theorem l4.3l about its counterpart in Problem a'). It does not 
give an almost optimal constant a in the inequality (13111 for 0 < u < 
with a small e > 0. On the other hand, this estimate is sharp in that sense that 
disregarding the value of the universal constant a it cannot be improved. It 
seems to be appropriate in the solution of the problems about non-parametric 
maximum likelihood estimates mentioned in the Introduction. 

The estimate (ISTTl on the probability P {\Jn,k{f)\ > u) can be rewritten, simi¬ 
larly to relation (pn|i. in such a form which holds for all u > 0. On the other hand, 
both Theorem 14. 3l fi,nd Theorem 14. 4l vie1d a very weak estimate if m 
W e met a similar situation in Section [3 when these problems were investigated 
in the case k = 1. It is natural to expect that a generalization of Bennett’s 
inequality holds in the multivariate case k > 2, and it gives an improvement of 
estimates (P|l and m in the case u ^ for all fc > I. I can prove only 

partial results in this direction which are not sharp in the general case. On the 
other hand, there is a possibility to give such a generalization of Example 12.41 
which shows that the inequalities implied by Theorem 14.31 or 14.41 in the case 
u > k > 2 have only a slight improvement. 

The results of Theorems IQ and lO imply that in the case u < 
under the condition of these results the probabilities P{n'^^‘^\In,k{f)\ > u) and 
P{\Jn,k{f)\ > u) can be bounded by P{Ca\r]\^ > u) with an appropriate uni¬ 
versal constant C = C{k) >0 depending only on the order k of the degenerate 
t7-statistic In,k{f) or of the multiple random integral Jn,kif), where the random 
variable rj has standard normal distribution, and 


a 


2 


J f{xi,...,xk)p{dxi)...p{dxk). 


A generalization of Example 12 . 41 can be given which shows for all /c > I that in 
the case u ^ we can have only a much weaker estimate. I shall present 

such an example only for k = 2, but it can be generalized for all fc > 1. This 
example is taken from my Lecture Note ^21 (Example 8.6). Here I present it 
without a detailed proof. The proof which exploits the properties of Example l2.4l 
is not long. But I found more instructive to explain the idea behind this example. 


Example 4.5. Let be a sequence of independent, identically dis¬ 

tributed valued random variables taking values in the plane, i.e. in X = E?, 
such that fj = Vj.i and 77^,2 are independent, Pipj,! = 1) = P{r]j,i = 

-1) = 4' = 0) = 1 - PiV3,2 = 1) = PiVj ,2 = -1) = 5 for all 1 < 

j <n. Let us introduce the function f{x,y) =/((cci, 0:2), (j/i, 7/2)) = Xiy 2 + X 2 yi, 
X = {xi,X 2 ) G , y = (2/1,7/2) G on X^, and define the U-statistic 


IuM)= X! (.diprika + Rkprija) 

j^k 


( 32 ) 
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of order 2 with the above kernel function f and the sequence of independent 
random variables Then In, 2 {f) is a degenerate U-statistic. If u > 

Binu^ with some appropriate constant Bi > 0, Bf^n > u > B 2 n~^ with a 
sufficiently large fixed number B 2 > 0, and 1 > cr > then the estimate 

P{n~^In, 2 {f) > u)> exp log } (33) 

holds with some constant B > 0 depending neither on n nor on a. 

It is not difficult to see that the {/-statistic In, 2 {f) introduced in Exam¬ 
ple is a degenerate {/-statistic of order two with a kernel function / such 
that sup |/(a;,?/)| < 1 and cr^ = f P{x, y)p,{ dx)p{ dy) = = cr^. 

Example 14.51 means that in the case u na^, (i.e. if u ^ with k = 2) 

a much weaker estimate holds than in the case u < na^. Let us fix the num¬ 
bers u and n, and consider the dependence of our estimate on a. The estimate 
P(n“^|/„^2(/)| > u) < holds if cr = and 

Example ^31 shows that a rather weak improvement appears if cr <C . 

To understand why the statement of Example ^21 holds observe that a small 
error is made if the condition j f k is omitted from the summation in formula 
E3), and this suggests that the approximation 


-InAf) 

n 



causes a negligible error. This fact together with the independence of the se¬ 
quences Tjjp, 1 < j < n, and r]j^ 2 , 1 < j < imply that 


P{n ^InAf) > u) 


P 



> 


P 




P 


0=1 


> V2 


(34) 


with such a choice of numbers vi and V 2 for which V 1 V 2 = 

The first probability at the right-hand side of m can be bounded because 

of the result of Exa,mn1e 12.41 as -P ( X) Vj,i > "yi j > g-Bviiog(4vi/nP) > 


Pna^, and the second probability as P 


E 


?7i,2 > V2 


> Ce with some 


\i=i / 

appropriate C > 0 and // > 0 if 0 < U2 < n. The proof of Example 1121 can be 
obtained by means of an appropriate choice of the numbers vi and V 2 ■ 


In Theorem 10 the distribution of a A:-fold Wiener-Ito integral Z^pf) was 
bounded by the distribution of with a standard normal random variable p 
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and an appropriate constant a. By Theorems 14 . ,HI a.nd 14 . 41 a, similar, but weaker 
estimate holds for the distribution of a degenerate [/-statistic In,k{f) or random 
integral Jn,k{f)- In the next section I briefly explain why such results hold. 

There is a method to get a good estimate on the moments of the random 
variables considered in the above theorems, and they enable us to get a good 
estimate also on the distribution of the random integrals and [/-statistics ap¬ 
pearing in these theorems. The moments of a fc-fold Wiener-Ito integral can be 
bounded by the moments of arf^ with an appropriate cr > 0, and this estimate 
implies Theorem oi Theorems lO and lOI can be proved in a similar way. 
But we can give a good estimate only on not too high moments of the random 
variables In,kif) and Jn,k{f), and this is the reason why we get only a weaker 
result for their distribution. 

Remark: My goal was to obtain a good estimate in Problems a) and a') if 
we have a bound on the L 2 and Lao norm of the kernel function / in them. 
A similar problem was considered in Problem a") about Wiener-Ito integrals 
with the difference that in this case only an L 2 bound of the function / is 
needed. Theorems 1001 and EM provided such a bound, and as Example 14.21 
shows these estimates are sharp. On the other hand, if we have some additional 
information about the kernel function /, then more precise estimates can be 
given which in certain cases yield an essential improvement. Such results were 
known for [/-statistics and Wiener-Ito integrals of order k = 2, (see and m) 
and quite recently (after the submission of the first version of this work) they 
were generalized in ^ and m to general k >2. Moreover, these improvements 
are useful in the study of some problems. Hence a referee suggested to explain 
them in the present work. I try to follow his advice by inserting their discussion 
at the end, in the open problems part of the paper. 

5. On the proof of the results in Section |4] 

Theorem EH can be proved by means of the following 

Proposition 5.1. Let the conditions of Theorem \4. 1\ be satisfied for a multiple 
Wiener-Ito integral Z^^kif) of order k. Then, with the notations of Theorem \4. 1[ 
the inequality 


E (|Z^,fe(/)|)''^ < 1 • 3 • 5 • • • {2kM - 


(35) 


holds for all M = 1,2,.... 

By the Stirling formula Proposition 15.II implies that 



(36) 


for any A > \/2 if M > Mq = Mo{A), and this estimate is sharp. The following 
Proposition l5.2l which can be applied in the proof of Theorem l4.3l states a similar, 
but weaker inequality for the moments of normalized degenerate [/-statistics. 
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Proposition 5.2. Let us consider a degenerate U-statistic In,kif) of order k 
with sample size n and with a kernel function f satisfying relations and 
rM) with some 0 < < 1. Fix a positive number ij > 0. There exist some 

universal constants A = A{k) > \f2, C = C{k) > 0 and Mq = M^ik) > 1 
depending only on the order of the U-statistic In,k{f) such that 

2 A/T X O \ 

E(n-’^/^k\Ir,,k{f)) + (37) 

for all integers M such that kMo < kM < ana^. 

The constant C = C{k) in formula can be chosen e.g. as C = 2^/2 which 
does not depend on the order k of the U-statistic In,kif)- 

Formula (1^ can be reformulated as £'(|Z^_fc(/)|)^^ < , where p is 

a standard normal random variable. Theorem l4.1l states that the tail distribution 
of k\\Zf^^kif)\ satisfies an estimate similar to that of a\r]\^. This can be deduced 
relatively simply from Proposition 15.II and the Markov inequality P(|Z^_fc(/)| > 

u) < — with an appropriate choice of the parameter M. 

Proposition 15.21 gives a bound on the moments of kin '^^^In,k{f ) similar to 
the estimate (ESI on the moments of Zfj_^k{f)- The difference between them is 
that estimate m in Proposition o contains a factor (1 + Cy/af'^^ at its 
right-hand side, and it holds only for such moments E {kin (/))'“ for 

which kM[) < kM < ana^ with some constant Mq. The parameter a > 0 in 
relation EEl) can be chosen in an arbitrary way, but it yields a really useful 
estimate only for not too large values. Theorem 14.31 ca,n be proved by means of 
the estimate in Proposition 15.21 and the Markov inequality. But because of the 
relatively weak estimate of Proposition 15.21 only the estimate of Theorem 14.31 
can be proved for degenerate {/-statistics. The main step both in the proof of 
Theorem EH and ESI is to get good moment estimates. 

A most important result of the probability theory, the so-called diagram for¬ 
mula about multiple Wiener-Ito integrals can be applied in the proof of Proposi¬ 
tion This result can be found e.g. in ini It enables us to rewrite the product 
of Wiener-Ito integrals as a sum of Wiener-Ito integrals of different order. It 
got the name ‘diagram formula’, because the kernel functions of the Wiener-Ito 
integrals appearing in the sum representation of the product of Wiener-Ito in¬ 
tegrals are defined with the help of certain diagrams. As the expectation of a 
Wiener-Ito integral of order k equals zero for all fc > 1, the expectation of the 
product equals the sum of the constant terms (i.e. of the integrals of order zero) 
in the diagram formula. The sum of the constant terms in the diagram formula 
can be bounded, and such a calculation leads to the proof of Proposition 

A version of the diagram formula can be proved both for the product of mul¬ 
tiple random integrals Jn,k{f) defined in formula 1(21 (see CHI) or for degenerate 
{/-statistics (see uni) which expresses the product of multiple random integrals 
or degenerate {/-statistics as a sum of multiple random integrals or degener¬ 
ate {/-statistics of different order. The main difference between these new and 
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the original diagram formula about Wiener-Ito integrals is that in the case of 
random (non-Gaussian) integrals or degenerate [/-statistics some new diagrams 
appear, and they give an additional contribution in the sum representation of 
the product of random integrals Jn,k{f) or of degenerate [/-statistics In,k{f)- 

Prnr)osition l5.2l ca,n be proved by means of the diagram formula for the prod¬ 
uct of degenerate [/-statistics and a good bound on the contribution of all in¬ 
tegrals corresponding to the diagrams. Theorem 14.41 can be proved similarly 
by means of the diagram formula for the product of multiple random inte¬ 
grals Jn,kif) (see [HI). The main difficulty of such an approach arises, because 
the expected value of a fc-fold random integral Jn,kif) (unlike that of a Wiener- 
Ito integral or degenerate [/-statistic) may be non-zero also in the case fc > 1. 
The expectation of all these integrals is small, but since the diagram formula 
contains a large number of such terms, it cannot supply such a sharp estimate 
for the moments random integrals Jn,k{f) as we have for degenerate [/-statistics 
In,k{f)- On the other hand, Theorem l4.4l can be deduced from Theorems l4.dlld.2l 
and formulas (Enj and EU. 

Remark: The diagram formula is an important tool both in investigations in 
probability theory and statistical physics. The second chapter of the book ED 
contains a detailed discussion of this formula. Paper m explains the combinato¬ 
rial picture behind it, and it contains some interesting generalizations. Paper m 
is interesting because of a different reason. It shows how to prove central limit 
theorems for stationary processes in some non-trivial cases by means of the di¬ 
agram formula. In this paper it is proved that the moments of the normalized 
partial sums have the right limit as the number of terms in them tends to in¬ 
finity. Actually, the limit of the semi-invariants is investigated, but this can be 
considered as an appropriate reformulation of the study of the moments. The 
approach in paper lai and the proof of the results mentioned in this work show 
some similarity, but there is also an essential difference between them. In pa¬ 
per m the limit of fixed moments is investigated, while e.g. in Problem a') we 
want to get good asymptotics for such moments of [/-statistics In,k{f) whose 
order may depend on the sample size n of the [/-statistic. The reason behind 
this difference is that we want to get a good estimate of the probabilities defined 
in Problem a') also for large numbers u, and this yields some large deviation 
character to the problem. 

The statement of Example ^21 follows relatively simply from another impor¬ 
tant result about multiple Wiener-Ito integrals, from the so-called Ito formula 
for multiple Wiener-Ito integrals (see e.g. m or ini) which enables us to ex¬ 
press the random integrals considered in ExamDle l4.2l as the Hermite polynomial 
of an appropriately defined standard normal random variable. 

Here I did not formulate the diagram formula, hence I cannot explain the 
details of the proof of Propositions o and O I discuss instead an analogous, 
but simpler problem briefly which may help in capturing the ideas behind the 
proofs outlined above. 

Let us consider a sequence of independent and identically distributed random 
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variables ..., with expectation zero, take their sum ^ and let us 

i=i 

try to give a good estimate on the moments for all M = 1,2,.... Because 

of the independence of the random variables and the condition E^j = 0 we 
can write 

Esi^= Y. (38) 

(jl ) ■ • ■ 5 Js ) ^1 ) • ■ ■ ) ^s) 

ji + ■ ■ ■ + js = 2M, ju > 2 for all 1 < M < s, 
if rt ^ rt 

Simple combinatorial considerations show that a dominating number of terms 
at the right-hand side of (PI are indexed by a vector (ji, • ■ ■ ,jM, h, ■ ■ ■, Im) 
such that ju = 2 for all 1 < u < M, and the number of such vectors is equal 
(m) ^ 2 ^^' ~ &m] ■ 3"^® asymptotic relation holds if the number n 

of terms in the random sum Sn is sufficiently large. The above considerations 
suggest that under not too restrictive conditions ES"^ ^ ~ 

Efln^ 2 , where = E^'^ is the variance of the terms in the sum Sn, and rju is a 
random variable with normal distribution with expectation zero and variance u. 
The question arises when the above heuristic argument gives a right estimate. 

For the sake of simplicity let us restrict our attention to the case when the 
absolute value of the random variables is bounded by 1. Let us observe that 
even in this case we have to impose a condition that the variance cr^ of the 
random variables is not too small. Indeed, let us consider such random vari¬ 
ables ^j, for which P{^j = 1) = = —1) = P[^j = Q) = 1 — cr^. These 

random variables have variance tr^, and the contribution of the terms E^j^, 
^ Si j Si n, to the sum in (PI equals ncr^. If cr^ is very small, then it may 
occur that ntr^ > and the approximation given for ES^^ in 

the previous paragraph does not hold any longer. Let us observe that for larger 
moments ES^^ the choice of a smaller variance is sufficient to violate the 
asymptotic relation obtained by this approximation. 

A similar picture arises in Proposition 15.21 If the variance of the random 
variable In,kif) is not too small, then those terms give the essential contribution 
to the moments of In,k(f) which correspond to such diagrams which appear also 
in the diagram formula for Wiener-Ito integrals. The higher moment we estimate 
the stronger condition we have to impose on the variance of In,kif) to preserve 
this property and to get a good bound on the moment we consider. 

In the next Section problems b), b') and b") will be discussed, where the 
distribution of the supremum of multiple random integrals Jn,k{f), degenerate 
C/-statistics In,k{f) and multiple Wiener-Ito integrals will be estimated 

for an appropriate class of functions f G P. Under some appropriate conditions 
for the class of functions F a similar estimate can be proved in these problems 
as in their natural counterpart when only one function is taken. The only dif¬ 
ference is that worse universal constants may appear in the new estimates. The 
conditions we had to impose in the results about problems a) and a') appear in 
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their counterparts problems b) and b') in a natural way. But these conditions 
also have some hidden, more surprising consequences in the study of the new 
problems. 

6. On the supremum of random integrals and [/-statistics 

To formulate the results of this section first I introduce some notions which 
appear in their formulation. Such properties will be introduced which say about 
a class of functions that it has relatively small and in some sense dense finite 
subsets. 

First I introduce the following definition. 

Definition of Lp-dense classes of functions with respect to some mea¬ 
sure. Let us have a measurable space lY,y) together with a a-finite measure 
V and a set G of y measurable real valued functions on this space. For all 
1 < p < oo, we say that Q is an Lp-dense class with respect to v and with 
parameter D and exponent L if for all numbers 1 > e > 0 there exists a finite 
e-dense subset Gs = {ffij-’-iSm} C G in the space Lp(Y,y,iy) consisting of 
m < De~^ elements, i.e. there exists a set Ge C G with m < De~^ elements 
such that inf J \g — gjf dv < for all functions g € G- 

gj&Qe 

The following notion will also be needed. 

Definition of Lp-dense classes of functions. Let us have a measurable space 
(T, 3f) and a set G of y measurable real valued functions on this space. We call 
G an Lp-dense class of functions, 1 < P < oo, with parameter D and exponent L 
if it is Lp-dense with parameter D and exponent L with respect to all probability 
measures v on {Y,y). 

The above introduced properties can be considered as possible versions of 
the so-called e-entropy frequently applied in the literature. Nevertheless, there 
seems to exist no unanimously accepted version of this notion. Generally the 
above introduced definitions will be applied with the choice p = 2, but because 
of some arguments in this paper it was more natural to introduce them in a 
more general form. The first result I present can be considered as a solution of 
problem b"). 

Theorem 6.1. Let us consider a measurable space {X,X) together with a a- 
finite non-atomic measure p on it, and let pw be a white noise with reference 
measure p on {X,W). Let X be a countable and L 2 -dense class of functions 
f{xi,..., Xk) on (X^, X^) with some parameter D and exponent L with respect 
to the product measure p^ such that 


f'^(xi, ..., Xk)p{ dxi)... p{ dxk) < with some 0 < cr < 1 for all f G IF. 
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Let us consider the multiple Wiener integrals Z^^k{f) introduced in formula 0] 
for all f G r. The inequality 

P ^sup |Z^,fe(/)| > < C'(i:) + l)exp|-a 0) | */ (^) >MLlog^ 

(39) 

holds with some universal constants C = C(k) > 0, M = M{k) > 0 and 
a = a{k) > 0. 

The next two results can be considered as a solution of problems b) and b'). 

Theorem 6.2. Let a probability measure pL be given on a measurable space 
{X, W) together with a countable and L 2 -dense class T of functions f{xi,..., Xk) 
of k variables with some parameter D and exponent L, L>1, on the product 
space which satisfies the conditions 

ll/lloo = sup |/(xi,...,a;fc)| < 1, for all f G r (40) 

Xj^X, 

and 

\\f\\l = =jfixi,...,xk)ti-idxi)...n{dxk)<a^ (41) 

for all f G T 

with some constant 0 < ct < 1. Then there exist some constants C = C{k) > 0, 
a = a{k) > 0 and M = M{k) > 0 depending only on the parameter k such that 
the supremum of the random integrals Jn,kif), f G T, defined by formula 
satisfies the inequality 

P ^sup I J„,fe(/)| > < CPexp |-a 00 I (42) 

if na^>(-Y^\MiL + pf/^\og-, 

\a/ a 

where /3 = max 0^ and the numbers D and L agree with the parameter 

and exponent of the L 2 -dense class T. 

Theorem 6.3. Let a probability measure p, be given on a measurable space 
{X, X) together with a countable and L 2 -dense class T of functions f{xi,..., Xk) 
of k variables with some parameter D and exponent L, L>1, on the product 
space {X^,X^) which satisfies conditions and \ 4 i\j with some constant 

0 < (7 < 1. Beside these conditions let us also assume that the U-statistics 
In,k{f) defined with the help of a sequence of independent fi distributed random 
variables ^ 1 ,... are degenerate for all f G T, or in an equivalent form, all 
functions f G T are canonical with respect to the measure fi. Then there exist 
some constants C = C{k) > 0, a = a(k) > 0 and M = M(k) >0 depending 
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only on the parameter k such that the inequality 
P ^sup < CPexp |-a 0^ ^ I (43) 

if > M(L + /3)3/2log-, 

Vcr/ a 

holds, where fd = max ^ , 0^ and the number D and L agree with the param¬ 

eter and exponent of the L 2 -dense class T. 

The above theorems whose proofs can be found in m or m in a more de¬ 
tailed version say that under some conditions on the class of functions T an 
almost as good estimate holds for problems b), b') and b") as for the analo¬ 
gous problems a), a') and a"), where similar problems were investigated, but 
only one function / was considered. An essential restriction in the results of 
Theorems Iti.ll lt).2l and iti.dl is that the condition > M(L,Z3) log § is im¬ 

posed in them with some constant M{L, D,k) depending on the exponent L and 
parameter D of the L 2 -dense class T. In Theorem 16.11 MIL. D. k) = ML was 
chosen, in Theorems lfi.2l a.ud 16.31 M(L. D, k) = M(L-|-/3)^/^ with an appropriate 

universal constant M = M{k) and [3 = max 0 \ogn ) ■ interested not 

so much in a good choice of the quantity M{L, D, k) in these results. Actually, 
they could have been chosen in a better way. We would like to understand why 
such conditions have to be imposed in these results. 

I shall also discuss some other questions related to the above theorems. Beside 
the role of the lower bound on one would also like to understand why we 

have imposed the condition of L 2 “dense property for the class of functions T in 
Theorems and lO This is a stronger restriction than the condition about 
the L 2 -dense property of the class T with respect to the measure imposed 
in Theorem lO It may be a little bit mysterious why in Theorems lO and lO 
such a condition is needed by which this class of functions is L 2 (i^)-dense also 
with respect to such probability measures v which seem to have no relation to 
our problems. I can give only a partial answer to this question. In the next 
section I present a very brief sketch of the proofs which shows that in the 
proof of Theorems lO and ESI the L 2 -dense property of the class of functions 
T is applied in the form as it was imposed. I shall discuss another question 
which also naturally arises in this context. One would like to know some results 
which enable us to check the L 2 "dense property and show that it holds in many 
interesting cases. 

I shall discuss still another problem related to the above results. One would 
like to weaken the condition by which the classes of functions J- must be count¬ 
able. Let me recall that in the Introduction I mentioned that our results can be 
applied in the study of some non-parametric maximum likelihood problems. In 
these applications such cases may occur where we have to work with the supre- 
mum of non-countably infinite random integrals. I shall discuss this question 
separately at the end of this section. 
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I show an example which shows that the condition > M{L,D,k) log ^ 

with some appropriate constant M{L, D,k) > 0 cannot be omitted from Theo- 
rem lti.ll In this example ([0,i.e. the interval [0,1] together with the Borel 
cr-algebra is taken as the measurable space {X, X), and the Lebesgue measure A 
is considered on [0,1] together with the usual white noise Aw with the Lebesgue 
measure as its reference measure. Fix some number cr > 0, and define the class 
of functions of k variables J- = Ta on ([ 0 ,as the indicator functions of 

k 

the fc-dimensional rectangles C [ 0 , 1 ]^ such that all numbers aj and 

i=i 

bj, ^ < j < k, are rational, and the volume of these rectangles satisfy the condi- 

k 

tion {bj — aj) < tr^. It can be seen that this countable class of functions F is 
1=1 

L 2 -dense with respect to the measure A, (moreover it is L 2 -dense in the general 
sense), hence Theorem IQ can be applied to the supremum of the Wiener-Ito 
integrals ^A,fe(/) with the above class of functions f & F. 

Let the above chosen number u > 0 be sufficiently small and such that 
is a rational number. Let us define N = functions fj G F, where [cc] 

denotes the integer part of the number x in the following way: The function fj 
is the indicator function of the fc-dimensional cube we get by taking the fc-fold 
direct product of the interval [{j — with itself, 1 < j < N. Then 

all functions fj are elements of the above defined class of functions F = Fa-, 
and the Wiener-Ito integrals Z\^kifj), ^ ^ j ^ N, are independent random 
variables. Hence 


P ( sup |ZA,fc(/)| > M 1 > P 


( 


^ sup \ZxAfj)\ > = 1 - P{\Z^Mfi)\ < uf 

\l<j<N J 

(44) 

for all numbers u > 0. I will show with the help of relation igll) that for a small 
cr > 0 and such a number u for which = a logy with some a < y the 

probability P sup \Z\ k{f)\ > u ] is very close to 1. 

V/e^ ’ J 

By the Ito formula for multiple Wiener-Ito integrals (see e.g. M) the identity 
Z\.k{fj) = holds, where Pfe(-) is the fc-th Hermite polynomial with 

■ 2/fc 

leading coefficient I, and rjj = a~^'^ d\w, hence it is a standard 

normal random variable. With the help of this relation it can be shown that for 
all 0 < 7 < 1 there exists some cto = 0 - 0 ( 7 ) such that P(| 2 'A,fe(/i)| < w) < 1 — 
g- 7 (u/cr)^i '“/2 _ if 0 < cr < o-Q. Hence relation (Ell and the inequality 

( \ 1 

sup izA,fc(/)i > u \ > 1—^1— 

By choosing 7 sufficiently close to I it can be shown with the help of the above 
relation that with a sufficiently small cr > 0 and the above choice of the number u 


sup \Zx fc(/)| > u is almost I. 

’ 


the probability P 
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The above calculation shows that a condition of the type 

rni\2lk 2 

(-) > M{L,D,k)log- 

\a / a 

is really needed in Theorem lti.il With some extra work a similar example can be 
constructed in the case of Theorem E2 In this example the same space {X, X) 
and the same class of functions X = can be chosen, only the white noise 
has to be replaced for instance by a sequence of independent random variables 
^ 1 ,... with uniform distribution on the unit interval and with a sufficiently 
large sample size n. (The lower bound on the sample size should depend also 
on cr.) Also in the case of Theorem Iti.di a similar example can be constructed. I 
omit the details. 

The theory of Vapnik-Cervonenkis classes is a fairly popular and important 
subject in probability theory. I shall show that this theory is also useful in the 
study of our problems. It provides a useful sufficient condition for the L 2 -dense 
property of a class of functions, a property which played an important role in 
Theorems 16.21 and 16.111 To formulate the result interesting for us first I recall the 
notion of Vapnik-Cervonenkis classes. 

Definition of Vapnik-Cervonenkis classes of sets and functions. Let a 

set S be given, and let us select a class V consisting of certain subsets of this 
set S. We call D a Vapnik-Cervonenkis class if there exist two real numbers B 
and K such that for all positive integers n and subsets So{n) = {xi, ..., Xn} C S 
of cardinality n of the set S the collection of sets of the form So{n) V D, D £ T), 
contains no more than Bn^ subsets of So(n). We shall call B the parameter 
and K the exponent of this Vapnik-Cervonenkis class. 

A class of real valued functions T on a space lY,y) is called a Vapnik- 
Cervonenkis class if the collection of graphs of these functions is a Vapnik- 
Cervonenkis class, i.e. if the sets A(f) = {{y,t): y G Y, min(0,/(j/)) < t < 
max(0,/(y))}, f G T, constitute a Vapnik-Cervonenkis class of subsets of the 
product space S = Y x R^. 

The theory about Vapnik-Cervonenkis classes has generated a huge liter¬ 
ature. Many sufficient conditions have been stated which ensure that certain 
classes of sets or functions are Vapnik-Cervonenkis classes. Here I do not dis¬ 
cuss them. I only present an important result of Richard Dudley, which states 
that a Vapnik-Cervonenkis class of functions bounded by 1 is an Li-dense class 
of functions. 

Theorem 6.4. Let f(y), f G J-, be a Vapnik-Cervonenkis class of real valued 
functions on some measurable space {Y,y) such that sup|/(y)| < 1 for all 

V&Y 

f G T. Then T is an Li-dense class of functions on {Y,y). More explicitly, if T 
is a Vapnik-Cervonenkis class with parameter B > 1 and exponent K > 0, then 
it is an Li-dense class with exponent L = 2K and parameter D = CB^ 
with some universal constant C > 0. 
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The proof of this result can be found in m (25 Approximation Lemma) 
or in my Lecture Note 122 • Formally, Theorem lti.4l gives a sufficient condition 
for a class of functions to be an Li-dense class. But it is fairly simple to show 
that a class of functions satisfying the conditions of Theorem lOI is not only 
an Li, but also an L 2 -dense class. Indeed, an Li-dense class of functions whose 
absolute values are bounded by 1 in the supremum norm is also an L 2 -dense 
class, only with a possibly different exponent and parameter. I finish this section 
by discussing the problem how to replace the condition of countable cardinality 
of the class of functions in Theorems lO and 1(01 by a useful weaker condition. 

6.1. On the supremum of non-countable classes of random integrals 
and U-statistics 

First I introduce the following notion. 

Definition of countably approximable classes of random variables. Let 

a class of random variables U{f), f G if, indexed by a class of functions on 
a measurable space (Y, 3^) be given. We say that this class of random variables 
U{f), f G T, is countably approximable if there is a countable subset T' GL T 
such that for all numbers u > 0 the sets A{u) = {to-, sup \U{f){uj)\ > u} and 

B{u) = {ui: sup |C/(/)(tt’)| > u} satisfy the identity P{A{u) \ B{u)) = 0. 

It is fairly simple to see that in Theorems EH E21 and l?01 the condition 
about the countable cardinality of the class of functions T can be replaced 
by the weaker condition that the class of random variables Jn,k{f) or 

In,k{f), f G J-, is 8. countably approximable class of functions. One would like 
to get some results which enable us to check this property. The following simple 
lemma (see Lemma 4.3 in |22p may be useful for this. 

Lemma 6.5. Let a class of random variables U{f), f G T, indexed by some 
set T of functions on a space iY,y) be given. If there exists a countable subset 
T' <G T of the set T such that the sets A{u) = {uj: sup |(7(/)(w)| > u} and 

B[u) = {w: sup \U{f){oj)\ > u} introduced for all u > 0 in the definition of 
feP' 

countable approximability satisfy the relation A{u) C B{u — e) for all u > e > 0, 
then the class of random variables U{f), f GT, is countably approximable. 

The above property holds if for all f G iF, e > 0 and u G Lt there exists a 
function f = f{f,£,uj) G T' such that |?7(/)(a;)| > |?7(/)(a;)| — e. 

Thus to prove the countable approximability property of a class of random 
variables U{f), f G iF, it is enough to check the condition formulated in the 
second paragraph of Lemma |^31 I present an example when this condition can 
be checked. This example is particularly interesting, since in the study of non- 
parametric maximum likelihood problems such examples have to be considered. 
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Let us fix a function f{xi,... ,Xk), sup |/(a;i,...,ccfc)] < 1, on the space 
{X^, X^) = with some s > 1, where denotes the Borel cr-algebra on 

the Euclidean space i?*, together with some probability measure /r on {R‘^,B^). 
For all vectors (ui,..., itfc), (z)i,..., Vk) such that Uj,Vj G i?® and uj < Vj, 1 < 
j ^ k, (i.e. all coordinates of Uj are smaller than or equal to the corresponding 
coordinate of Vj) let us define the function fui,...,uk,vi,...,vk which equals the 
function / on the rectangle x •• • [uk,Vk], and it is zero outside of this 

rectangle. 

Let us consider a sequence of i.i.d. random variables taking values 

in the space (i?®, B‘^) with some distribution /r, and define the empirical measure 
and random integrals Jn,k{fui,...,uk,vi,...,vk) by formulas Q and (| 2 l for all 
vectors {ui, ... ,Uk), uj < Vj for all 1 < j < fc, with the above 

defined functions fux,...,uk,vi,...,vk- The following result holds (see Lemma 4.4 
in [ 22 ). 

Lemma 6.6. Let us take n independent and identically distributed random vari¬ 
ables with values in the space Let us define with the help 

of their distribution p, and the empirical distribution /i„ determined by them 
the class of random variables Jn,k{fui,...,uk,vi,...,vk) introduced in formula P|). 
where the class of kernel functions T in these integrals consists of all functions 
fui,...,uk,vi,...,vk G Uj,Vj G i?®, Uj < Vj, 1 < j < fc, introduced in 

the last hut one paragraph. This class of random variables Jn,k{f), f G iF, is 
countably approximable. 

Let me also remark that the class of functions fui,...,uk,vi,...,vk is also an 
L 2 -dense class of functions, actually it is also a Vapnik-Cervonenkis class of 
functions. As a consequence, Theorem 16.21 can be applied to this class of func¬ 
tions. 

To clarify the background of the above results I make the following remark. 
The class of random variables Z^^k{f), Jn,k{f) or /„,*,(/), f G T, can be consid¬ 
ered as a stochastic process indexed by the functions f G T, and we estimate the 
supremum of this stochastic process. In the study of a stochastic process with a 
large parameter set one introduces some smoothness type property of the tra¬ 
jectories which can be satisfied. Here we followed a very similar approach. The 
condition formulated in the second paragraph of Lemma 16.51 can be considered 
as the smoothness type property needed in our problem. 

In the study of a general stochastic process one has to make special efforts 
to find its right version with sufficiently smooth trajectories. In the case of 
the random processes Jn,k{f) or In,k{f), f G T, this right version can be 
constructed in a natural, simple way. A finite sequence of random variables 
^i(a;), ..., is given at the start, and the random integrals Jn,k{f){^) or 

t7-statistics In,k{f)ix), f G J-, can be constructed separately for all w G H on 
the probability field (H, A, P) where the random variables Ci(w),... are 

living. It has to be checked whether the ‘trajectories’ of this random process have 
the ‘smoothness properties’ necessary for us. The case of a class of Wiener-Ito 
integrals Z^^kif), f G if, is different. Wiener-Ito integrals are defined with the 
help of some L 2 -liniit procedure. Hence each random integral is defined 
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only with probability 1, and in the case of a non-countable set of functions 
the right version / G IF, of the Wiener-Ito integrals has to be found to 

get a countably approximable class of random variables. 

R. M. Dudley (see e.g. |S]) worked out a rather deep theory to overcome the 
measurability difficulties appearing in the case of a non-countable set of random 
variables by working with analytic sets, Suslin property, outer probability, and 
so on. I must admit that I do not know the precise relation between this theory 
and our method. At any rate, in the problems discussed here our elementary 
approach seems to be satisfactory. 

In the next two sections I discuss the idea of the proof of Theorems EH 
EH and EH A simple and natural approach, the so-called chaining argument 
suffices to prove Theorem lti.il In the case of Theorems Iti . 21 and Iti . dl this chaining 
argument can only help to reduce the proof to a slightly weaker statement, 
and we apply an essentially different method based on some randomization 
arguments to complete the proof. Since in the multivariate case k > 2 some 
essential additional difficulties appear, it seemed to be more natural to discuss 
it in a separate section. 

7. The method of proof of Theorems and EH 

There is a simple but useful method, called the chaining argument, which helps 
to prove Theorem 16.11 It suggests to take an appropriate increasing sequence 
j = 0,1 ,..., of L 2 -dense subsets of the class of functions J- and to estimate 

the supremum of the Wiener-Ito integrals f £ !Fj, for all j = 0,1,- 

In the application of this method first we define a sequence of subclasses Qj 
oi j = 0,1, 2,..., such that Qj = ..., gj,mj } C .F is an 2“'’^cr-dense 

subset of T in the L2(/4*)-norm, i.e. they satisfy the relation 

.inf 

1 < / < rrij 

= inf {f{xi,...,xk)-gj,i{xi,...,xk)fg,{dxi)...p{dxk) 

l<Z<mj J 

< 2-2j'=cr2 (45) 

for all / G F, and also the inequality mj < D2^^^a~^ holds. Such sets Qj 
exist because of the conditions of Theorem EH Let us also define the classes of 

3 

functions Fj = [J Gp, and sets 

p—0 


Bj = Bj{u) = I w: sup > u (l - 2 i 


j = 0,l,2,.... 


Given a function /j+i,; G Gj-i-i let us choose such a function fj^i> G Fj with some 
I' = /'(/) for which p{fjgi,fj+ig) < 2~^^a with the function p{f,g) defined in 
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formula H45|l. Then 


rrij+i 

PiB,+,) < PiB,) +Y,P {\ZgAfj+id - hi')\ > . ( 46 ) 

Theorem oi yields a good estimate of the terms in the sum at the right-hand 
side of HEl, and it also provides a good bound of the probability P{Bo). With 
the help of some small modification of the construction it can be achieved that 

OO 

also the relation U holds. The proof of Theorem 16. II follows from the 

j=o 

estimates obtained in such a way. 

Theorem E21 can be deduced from Theorem lO relatively simply with the 
help of Theorem E31 since Theorem IQ enables us to give a good bound on all 
terms in the sum at the right-hand side of formula m- The only non-trivial 
step in this argument is to show that the set of functions fv, f € P, appearing 
in formula (1 1911 satisfy the estimates needed in the application of Theorem ib.dl 
Relations and eu are parts of the needed estimates. Beside this, it has 
to be shown that if T is an L 2 -dense class of functions, then the same relation 
holds for the classes of functions Ty = {fv - / € .F} for all sets V C {1,..., fc}. 
This relation can also be shown with the help of a not too difficult proof (see m 
or mi but this question will be not discussed here. 

One may try to prove Theorem 16. Ill similarly to Theorem 16. II with the help 
of the chaining argument. But this method does not work well in this case. 
The reason for its weakness is that the tail distribution of a degenerate U- 
statistic with a small variance A does not satisfy such a good estimate as the 
tail distribution of a multiple Wiener-Ito integral. At this point the condition 
u < in Theorem 4.2 plays an important role. Let us recall that, as Ex¬ 

ample ^21 shows, the tail distribution of the normalized degenerated [/-statistics 
satisfies only a relatively weak estimate at level u if u ^ 

We may try to work with an estimate analogous to relation in the proof of 
Theorem 16.31 But the probabilities appearing at the right-hand side of such an 
estimate cannot be well estimated for large indices j. 

Thus we can start the procedure of the chaining argument, but after finitely 
many steps we have to stop it. In such a way we can find a relatively dense 
subset pQ C P (in L 2 A) norm) such that a good estimate can be given for 
the distribution of the supremum sup /„ fc(/)- This result enables us to reduce 

Theorem 16.in to a slightly weaker statement formulated in Pronosition l/.l I below. 
but it yields no more help. Nevertheless, such a reduction is useful. 

Proposition 7.1. Let us have a probability measure p on a measurable spaee 
(AT, A) together with a sequence of independent and p distributed random vari¬ 
ables ^ 1 ,... ,^„ and a countable L 2 -dense class P of canonical kernel functions 
f = /(cci,... ,Xk) (with respect to the measure p) with some parameter D and 
exponent L on the product space (Al^, /b^) such that all functions f G P satisfy 
conditions ifd}) and EP with some 0 < cr < 1. Let us consider the (degenerate) 
U-statistics In,kif) with the random sequence and kernel functions 
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f ^ T■ There exists a sufficiently large constant K = K(k) together with some 
numbers C = C{k) > 0, 7 = /{k) > 0 and threshold index Aq = Ao(fc) > 0 de¬ 
pending only on the order k of the U-statistics such that ifna^ > K{L-\-l3)\ogn 
with P = max ^ 0 ^, then the degenerate U-statistics In,k{f), f ^ satisfy 

the inequality 

P sup 
\feP 

The statement of Proposition o is similar to that of Theorem Iti.HI The 
essential difference between them is that ProDOsition IT.ll vields an estimate only 
for u > with a sufficiently large constant Aq, i.e. for relatively 

large numbers u. In the case u ^ it yields a weaker estimate than 

formula in Theorem 10 but actually we need this estimate only in the 
case of the number A in formula m being bounded away both from zero and 
infinity. 

The proof of Proposition o briefly explained below, is based on an induc¬ 
tive procedure carried out by means of a symmetrization argument. In each step 
of this induction we diminish the number Aq for which we show that inequal¬ 
ity (I47II holds for all numbers with A > Aq. This diminishing of the 

number Aq is done as long as it is possible. It has to be stopped at such a num¬ 
ber ^0 for which the probability can be well 

estimated by Theorem 14.dl for all functions f G P. This has the consequence 
that Proposition l7.1l vields just such a strong estimate which is needed to reduce 
the proof of Theorem 101 to a statement that can be proved by means of the 
chaining argument. 

In the symmetrization argument applied in the proof of Proposition sev¬ 
eral additional difficulties arise if the multivariate case fc > 2 is considered. 
Hence in this section only the case fc = 1 is discussed. A degenerate CZ-statistic 
of order 1 is the sum of independent, identically distributed random vari¬ 
ables with expectation zero. In this paper the proof of Proposition rm will be 
only briefly explained. A detailed proof can be found in m or [221 ■ Let me also 
remark that the method of these works was taken from Alexander’s paper |2], 
where all ideas appeared in a different context. 

We shall bound the probability appearing at the left-hand side of lITTI) (if k = 
1 ) from above by the probability of the event that the supremum of appropri¬ 
ate randomized sums is larger than some number. We apply a symmetrization 
method which means that we estimate the expression we want to bound by 
means of a randomized (symmetrized) expression. Lemma |7.21 formulated be¬ 
low, has such a character. 

Lemma 7.2. Let a countable class of functions P on a measurable space (A, X) 
and a real number 0 < a < 1 be given. Consider a sequence of independent, 
identically distributed X-valued random variables ^ 1 ,... ,^„ such that Ef{fi) = 
0, for all f G P together with another sequence ei,...,e„ of 

independent random variables with distribution P{sj = 1 ) = P{ej = — 1 ) = 


|ri-'=/"J„,fe(/)| > An>^/^a'^+^ < Ce-^ 




ifA>Ao. (47) 
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^ ^ j ^ n, independent also of the random sequence ... ,^n- Then 


P ^ sup 


< 4P 


E/«>) 

j=i 




^ sup 
/n/6^ 


i=i 


>|nVVl tfA> 


3V2 


(48) 


Let us first understand why Lemma 17.21 can help in the proof of Proposi¬ 
tion rm It enables to reduce the estimate of the probability at the left-hand 
side of formula to that at its right-hand side. This reduction turned out 
to be useful for the following reason. At the right-hand side of formula 7.4 the 
probability of such an event appears which depends on the random variables 
^ 1 ,...,and some randomizing terms ei,..., e„. Let us estimate the probabil¬ 
ity of this event by bounding first its conditional probability under the condition 
that the values of the random variables ^ 1 ,..., are prescribed. These condi¬ 
tional probabilities can be well estimated by means of Hoeffding’s inequality 
formulated below, and the estimates we get for them also yield a good bound 
on the expression at the right-hand side of iHSl) . 

Hoeffding’s inequality, (see e.g. in |2H| PP- 191-192), more precisely its special 
case we need here, states that the linear combinations of independent random 
variables P(ej = 1) = P(£j = —1) = 1 < J < ri, behave so as the central 

limit theorem suggests. More explicitly, the following inequality holds. 

Theorem 7.3 (Hoeffding’s inequality). Let£i,...,£„ be independent ran¬ 
dom variables, P{ej = 1) = P{£j = —1) = ^ ^ j ^ n, and let oi,..., a„ be 

n 

arbitrary real numbers. Put V = ajSj. Then 

j=i 

-P(f/> y) < exp i forally>0. (49) 

As we shall see, the application of Lemma 17.21 together with the above men¬ 
tioned conditioning argument and Hoeffding’s inequality enable us to reduce 

n n 

the estimation of the distribution of sup ^ f{^j) to that of sup ^ = 

feJ^j=i j=i 

n 

sup [/^(Ci) ~ At first sight it may seem so that 

we did not gain very much by applying this approach. The estimation of the 
supremum of a class of sums of independent and identically distributed random 
variables was replaced by the estimation of a similar supremum. But a closer 
look shows that this method can help us in finding a proof of ProDOsition l7.ll We 
have to follow at what level we wanted to bound the distribution of the supre¬ 
mum in the original problem, and what level we have to choose in the modified 
problem to get a good estimate in the problem we are interested in. It turns 
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out that in the second problem we need a good estimate about the distribution 
of the supremum of a class of sums of independent and identically distributed 
random variables at a considerable higher level. This observation enables us to 
work out an inductive procedure which leads to the proof of ProDOsition l7.ll 

Indeed, in Proposition rm estimate P7| has to be proved for all numbers 
A > Aq with some appropriate number ^o- This estimate trivially holds if 
Aq > because in this case condition (P|) about the functions f (z T implies 
that the probability at the left-hand side of m equals zero. The argument of 
the previous paragraph suggests the following statement: If relation EZI) holds 
for some constant Aq, then it also holds for a smaller Hence Proposition 17.11 
can be proved by means of an inductive procedure in which the number Aq is 
diminished at each step. 

The actual proof consists of an elaboration of the details in the above heuristic 
approach. An inductive procedure is applied in which it is shown that if relation 
m holds with some number Aq for a class of functions T satisfying the condi¬ 
tions of Proposition rm then this relation also holds for it if Aq is replaced by 
provided that Aq is larger than some fixed universal constant. I would like 
to emphasize that we prove this statement not only for the class of functions 
we are interested in, but simultaneously for all classes of functions which satisfy 
the conditions of Proposition im As we want to prove the inductive statement 
for a class of functions J-, then we apply our previous information not to this 
class, but to another appropriately defined class of functions T' = T'{T) which 
also satisfies the conditions of Propositions rm I omit the details of the proof, 
I only discuss one point which deserves special attention. 

Hoeffding’s inequality, applied in the justification of the inductive procedure 
leading to the proof of Proposition 17.11 gives an estimate for the distribution 
of a single sum, while we need a good estimate on the supremum of a class of 
sums. The question may arise whether this does not cause some problem in the 
proof. I try to briefly explain that the reason to introduce the condition about 
the L 2 -dense property of the class T was to overcome this difficulty. 

In the inductive procedure we want to prove that relation holds for all 
A > A^q^ if it holds for all A > Aq. It can be shown by means of the inductive 
assumption which states that relation lITTI) holds for A > Aq and Hoeffding’s 
inequality Theorem 17.HI that there is a set Z? C such that the conditional 
probabilities 


P 



n 


i=i 


6 


6 


(50) 


are very small for all f G P, and the probability of the set fl \ D is negligibly 
small. Let me emphasize that at this step of the proof we can give a good 
estimate about the conditional probability in P1| for all functions f G P it 
Lo G D, but we cannot work with their supremum which we would need to apply 
formula This difficulty can be overcome with the help of the following 

argument. 
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Let us introduce the (random) probability measure v = v{uj) uniformly dis¬ 
tributed on the points ^i(w),... for all u G D. Let us observe that the 

(random) measure v has a support consisting of n points, and the j^-measure 
of all points in the support of ly equals This implies that the supremum of a 
function defined on the support of the measure v can be bounded by means of 
the L 2 (i^)-norm of this function. This property together with the L 2 (i^)-dense 
property of the class of functions T imposed in the conditions of ProDOsition l7.ll 
imply that a finite set {/i,..., /„} C T can be chosen with relatively few ele¬ 
ments m in such a way that for all f G J- there is some function /;, 1 < I < m, 
whose distance from the function / in the L 2 (v) norm is less than jQ, hence 



KKm 


tion that T is L 2 "dense with exponent L and parameter D enables us to give 
a good upper bound on the number m. This is the point, where the condition 
that the class of functions T is L 2 -dense was exploited in its full strength. Since 
we can give a good bound on the conditional probability in for all functions 


/ = /;, 1 < Z < m, we can bound the probability at the right-hand side of (SHI). 


It turns out that the estimate we get in such a way is sufficiently sharp, and the 
inductive statement, hence also Proposition rm can be proved by working out 
the details. 

I briefly explain the proof of Lemma 17.21 The randomizing terms 1 < J < 
n, in it can be introduced with the help of the following simple lemma. 

Lemma 7.4. Let ^i,..., ^„ and ^i,... , ^„ be two sequences of independent and 
identically distributed random variables with the same distribution p on some 
measurable space (X, X), independent of each other. Let ei,..., he a sequence 
of independent random variables P{ej = 1 ) = P{ej = —1) = |, 1 < j < n, 
which is independent of the random sequences and Take 

a countable set of functions J- on the space (X, ff). Then the set of random 
variables 



and its randomized version 



have the same joint distribution. 

Lemma 17.21 can be proved by means of Lemma 17.41 and some calculations. 
There is one harder step in the calculations. A probability of the type 
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has to be bounded from above by means of a probability of the type 


P ( ^ sup ^ (/(^j) - /(G)) >u-k] 

with some number K > 0. (Here the notation of Lemma 17^ is applied.) At this 
point the following symmetrization lemma may be useful. 

Lemma 7.5 (Symmetrization Lemma). Let Zp and Zp, p = 1,2,..., be two 

sequences of random variables independent of each other, and let the random 
variables Zp, p = 1,2 ,..., satisfy the inequality 

P{\Zp\ < a) > (3 for all p = 1,2,... (51) 


with some numbers a > 0 and (3 > 0. Then 


P 


sup \Zp\ > a + M ) < ip 

l<p<oo / M 


sup 

l<p<oo 


\Zp Zp\ 


> u 


for all u > 0 . 


The proof of Lemma l7.5l can be found for instance in | 2 d| (8 Symmetrization 
Lemma) or in | 22 | Lemma 7.1. 

Let us list the element of the countable class of functions T in Lemma m\ 
in the form P = {/i, / 2 ,..., }. Then Lemma 17.21 can be proved by means of 
Lemmas m and ESI with the choice of the random variables 


- n ^ n 

Zp = -j= ^ /p(G) Zp = ^ /p(G) 


p= 1,2 ,.... 


(52) 




i=i 


I omit the details. 

One may try to generalize the above sketched proof of Theorem Iti.dl to the 
multivariate case k > 2. Here the question arises on how to generalize Lemma l7.2l 
to the multivariate case and how to prove this generalization. These are highly 
non-trivial problems. This will be the main subject of the next section. 


8. On the proof of Theorem lOl in the multivariate case 

Here we are mainly interested in the question how to carry out the symmetriza¬ 
tion procedure in the proof of Proposition 17.II to the multivariate case A: > 2. It 
turned out that it is possible to reduce this problem to the investigation of mod¬ 
ified P-statistics, where k independent copies of the original random sequence 
are taken and put into the k different arguments of the kernel function of the 
P-statistic of order k. Such modified versions of P-statistics are called decoupled 
P-statistics in the literature, and they can be better studied by means of the 
symmetrization argument we are going to apply. To give a precise meaning of 
the above statements some definitions have to be introduced and some results 
have to be formulated. I introduce the following notions. 
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The definition of decoupled and randomized decoupled 17-statistics. 

Let us have k independent copies 1 < i ^ k, of a sequence 

of independent and identically distributed random variables taking 
their values in a measurable space {X,X) together with a measurable func¬ 
tion f{xi,... ,Xk) on the product space with values in a separable 

Banach space. Then the decoupled U-statistic determined by the random se- 
quences , • ■ ■, f,n 7 1 < J < and kernel function f is defined by the formula 


= ^ .fST 


Ij^l •- 


if 3^3' 


(53) 


Let us have beside the sequences H , 1 < J < fc, and function 

f(xi,... ,Xk) a sequence of independent random variables e = 

P{ei = 1) = P{ei = —1) = 1 < 1 < n, which is independent also of the 

sequences of random variables ..., 1 < j < fc- We define the ran- 

domized decoupled U-statistic determined by the random sequences ,... ,£,n , 
1 ^ i ^ fc; the kernel function f and the randomizing sequence £ 1 ,..., £„ by the 
formula 


«,*(/) = ^ E (“) 

l<lj<n, j = l,...,k 

if 37^3' 


Our first goal is to reduce the study of inequality EH) in ProDOsition I?. ll to an 
analogous problem about the supremum of decoupled 17-statistics defined above. 
Then we want to show that a symmetrization argument enables us to reduce 
this problem to the study of randomized decoupled 17-statistics introduced in 
formula A result of de la Pena and Montgomery-Smith formulated below 
helps to carry out such a program. Let me remark that both in the definition of 
decoupled 17-statistics and in the result of de la Pena and Montgomery-Smith 
functions / taking their values in a separable Banach space were considered, 
i.e. we did not restrict our attention to real-valued functions. This choice was 
motivated by the fact that in such a general setting we can get a simpler proof 
of inequality presented below. (The definition of 17-statistics given in for¬ 
mula © is also meaningful in the case of Banach-space valued functions /.) 

Theorem 8.1 (de la Pena and Montgomery Smith). Let us consider a 
sequence of independent and identically distributed random variables ^ 1 , •.. ,Cn 
on a measurable space (X, A) together with k independent copies 
^^jlfk. Let us also have a function /(cci,..., Xk) on the k-fold product space 
{X^,X^) which takes its values in a separable Banach space B. Define the U- 
statistic and decoupled U-statistic In,k{f) o,nd In,kif) with the help of the above 

(?) ' (?) 

random sequences £, 1 , ■ ■ ■ ,£n, , ■ • ■ ,£n , 1 < j < fc, and kernel function f. 
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There exist some constants C = C{k) >0 and 7 = 'y{k) > 0 depending only on 
the order k of the U-statistic such that 

Pi\\InAf)\\ >u)<CP (||/„,fe(/)ll > lu) (55) 

for all u > 0. Here || • || denotes the norm in the Banach space B where the 
function f takes its values. 

More generally, if we have a countable sequence of functions fs, s = 1,2,..., 
taking their values in the same separable Banach-space, then 

p( sup \\In,k{fs)\\ > u\ < CP ( sup ||/„,fc(/s)|| > 7 mV (56) 

\1<S<00 / \1<S<00 / 

The proof of Theorem l8.1l can be found in ^ or in Appendix B of my Lecture 
Note m Actually ^ contains only the proof of inequality itKHll . but can 
be deduced from it simply by introducing appropriate separable Banach spaces 
and by exploiting that the universal constants in formula do not depend on 
the Banach space where the random variables are living. Theorem 18.II is useful 
for us, because it shows that Proposition l? . II simolv follows from its version pre¬ 
sented in Proposition 10 below, where 17-statistics are replaced by decoupled 
{/-statistics. The distribution of a decoupled {/-statistic is not changing if the 
sequences of random variables put in some coordinates of its kernel function are 
replaced by an independent copy, and this is a very useful property in the appli¬ 
cation of symmetrization arguments. Beside this, the usual arguments applied 
in calculation with usual {/-statistics can be adapted to the study of decoupled 
{/-statistics. Now I formulate the following version of Proposition rm 

Proposition 8.2. Consider a class of functions f G P on the k-fold prod¬ 
uct [X^,X^) of a measurable space {X,X), a probability measure p, on {X,X) 
together with a sequence of independent and p, distributed random variables 

which satisfy the conditions of Proposition [7~n Let us take k inde- 

(i) (i) ' ' 

pendent copies fi t ■■ An , 1 C j C k, of the random sequence 

and consider the decoupled U-statistics In,k{f), f € P, defined with their help 
by formula There exists a sufficiently large constant K = K(k) together 

with some number 7 = 7 (fc) > 0 and threshold index Aq = A^ik) > 0 depend¬ 
ing only on the order k of the decoupled U-statistics we consider such that if 
nA > K{L -\- (3) logn with (3 = max , 0^, then the (degenerate) decoupled 

U-statistics In,k{f), f G P, satisfy the following version of inequality 

Proposition E3 and Theorem I8.ll imn1v Pronosition 17.11 Hence it is enough 
to concentrate on the proof of Proposition E21 It is natural to try to adapt 
the method applied in the proof of Proposition o in the case fc = 1. I try to 
explain what kind of new problems appear in the multivariate case and how to 
overcome them. 
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The proof of Proposition rm was based on a symmetrization type result 
formulated in Lemma 17.21 and Hoeffding’s inequality Theorem 17.dl We have 
to find the multivariate versions of these results. It is not difficult to find the 
multivariate version of Hoeffding’s inequality. Such a result can be found in m 
Theorem 12.3, or |21| contains an improved version with optimal constant in the 
exponent. Here I do not formulate this result, I only explain its main content. 
Let us consider a homogeneous polynomial of Rademacher functions of order k. 
The multivariate version of Hoeffding’s inequality states that its tail distribution 
can be bounded by that of Karf with some constant K = K(k) depending only 
on the order k of the homogeneous polynomial, where rj is a standard normal 
random variable, and cr^ is the variance of the random homogeneous polynomial. 

The problem about the multivariate generalization of Lemma 17.21 is much 
harder. We want to prove the following multivariate version of this result. 

Lemma 8.3. Let T he a class of functions on the space which satisfies 

the conditions of Proposition \7~1\ with some probability measure p. Let us have 
k independent copies \ 1 < j < of a sequence of independent pt 

distributed random variables ^i,... ,^„ and a sequence of independent random 
variables e = (ei,...,e„), P{ei = 1) = P{ei = —1) = \, \ < I < n, which is 
independent also of the random sequences ,... , 1 < J < fc- Consider the 

decoupled U-statistics In,k{f) defined with the help of these random variables by 
formula together with their randomized version j,(/) defined in ® for 
all f G P. There exists some constant Aq = A^lfk) > 0 such that the inequality 


P (^supn-'=/2 \In,kif)\ > (58) 

< 2'=+ip l^sup |I^,fe(/)| > 


holds for all A > Aq with some appropriate constant B = B{k). One can choose 
for instance B = 2^ in this result. 

The estimate in Lemma IQ is similar to formula (ESI) in Lemma o 
There is a slight difference between them, because the right-hand side of itKHll 
contains an additional constant term. But this term is sufficiently small, and 
its presence causes no problem as we try to prove Proposition 18.21 by means 
of Lemma lO In this proof we want to estimate the distribution of the supre- 
mum of the decoupled t/-statistics In,k{f), f G T, defined in formula llCTl . and 
Lemma l8 . 31 helps us in reducing this problem to an analogous one, where these 
decoupled [/-statistics are replaced by the randomized decoupled [/-statistics 
fc(/), defined in formula (IHUl . This reduced problem can be studied by taking 
the conditional probability of the event whose probability is considered at the 
right-hand side of lEHIi with respect to the condition that all random variables 
1 < i 1 < ^ take a prescribed value. These conditional proba¬ 
bilities can be estimated by means of the multivariate version of the Hoeffding 
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inequality, and then an adaptation of the method described in the previous sec¬ 
tion supplies the proof of PropositionThe proof is harder in this new case, 
but no new principal difficulty arises. 

Lemma 17.21 was proved by means of a simple result formulated in Lemma l7.4l 
which enabled us to introduce the randomizing terms Sj, 1 < j < n. In this 
result we have taken beside the original sequence ,..., an independent 
copy In the next Lemma 18.41 1 formulate a multivariate version of 

Lemma m which may help in the proof of Lemma 18.dl In its formulation I 
introduce beside the k independent copies ,... , 1 < j < fc, of the origi¬ 

nal sequence of independent, identically distributed random variables ■Cij • ■ ■ j Cn 
appearing in the definition of a decoupled U -statistic of order k another k inde¬ 
pendent copies ... ,^n \ ^ < j < k, oi this sequence. Because of notational 
convenience I reindex them, and I shall deal in lyemma, 18.41 with 2k independent 
copies ,..., Cnand ,..., > 1 < j < A:, of the original sequence 

■ j ^n- 

Now I formulate Lemma El 

Lemma 8.4. Let us have a (non-empty) class of functions T of k variables 
f{xi,... ,Xk) on a measurable space together with 2k independent 

copies ■. ■, and 1 < j < k, of a sequence of 

independent and identically distributed random variables on (X,X) 

and another sequence of independent random variables ei,... P{sj = 1) = 
P{ej = —1) = ^, 1 < j < n, independent of all previously considered ran¬ 
dom sequences. Let us denote the class of sequences of length k consisting of 
±I digits by 14, and let m{v) denote the number of digits —1 in a sequence 
V = (u(l),... ,v{k)) € Vfc. Let us introduce with the help of the above notations 
the random variables In,k{f) and Ln,k{f,£) as 

= ^ E (59) 

■ •uGVic l<ir<n, r=l,...,k 

irT^q/ if 


and 


in,k{f,£) — . I E 
■ v£Vk 


(- 1 )’ 


r{v) 


E 






( 1 )) AkMk))\ 

’ ■ ■ ■ ’) 


(60) 

for all f G P. The joint distributions of the random variables {In.kif), f S P) 
and e);/ € P} defined in formulas i|5.9|) and agree. 


The proof of Lemma 10 can be found as Lemma 11.5 in m Actually, this 
proof is not difficult. Let us observe that the inner sum in formula (PI is a 
decoupled t/-statistic, and in formula ininii it is a randomized decoupled U- 
statistic. (Actually they are multiplied by fc!). In formulas (I59II and EOI) such a 
linear combination of these expressions was taken which is similar to the formula 
appearing in the definition of Stieltjes measures. 
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Let US list the functions in the class of functions in Lemma |^31 in the form 
{/i, / 2 ,... } = IF, and introduce the quantities 



• 5 


(61) 


and 


Zp = Zp-n-'=/2/„,fc(/p), p=l,2,..., 


(62) 


with the random variables In,k{f) introduced in 15911 with the function / = fp. 
We would like to prove Lemma|^21with the help of Lemma|^31 This can be done 
with the help of some calculations, but this requires to overcome some very hard 



problems. We should like to bound a probability of the form P sup Zp > u 


from above with the help of a probability of the form P I sup {Zp — Zp) > ^ 


for all sufficiently large numbers u. The question arises how to prove such an 
estimate. This problem is the most difficult part of the proof. 

In the case k = 1 considered in the previous section the analogous problem 
could be simply solved by means of a Symmetrization Lemma formulated in 
Lemma This Lemma cannot be applied in the present case, because it has 
an important condition, it demands that the sequences of random variables Zp, 
p = 1,2,, and Zp, p = 1,2,..., should be independent. In the problem of 
Section 0 we could work with such sequences which satisfy this condition. On 
the other hand, the sequences Zp and Zp, p = 1,2,..., defined in formulas inu 
and we have to work with now are not independent in the case k > 2. They 
satisfy some weak sort of independence, and the problem is how to exploit this 
to get the estimates we need. 

Let us first formulate such a version of the Symmetrization Lemma which 
can be applied also in the problem investigated now. This is done in the next 
Lemma ESI 

Lemma 8.5 (Generalized version of the Symmetrization Lemma). Let 

Zp and Zp, p = 1,2,..., be two sequences of random variables on a probability 
space {fl. A, P). Let a a-algebra B C Abe given on the probability space (O, A, P) 
together with a B-measurable set B and two numbers a > 0 and /3 > 0 such that 
the random variables Zp, p = 1,2,..., are B measurable, and the inequality 


P{\Zp\ < a\B){u}) > P for all p = 1,2,... if lo G B 


(63) 


holds. Then 



( 64 ) 


for all u > 0. 
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The proof of Tjemma lH.hl is contained together with its proof in m under the 
name Lemma 13.1, and the proof is not hard. It consists of a natural adaptation 
of the proof of the original Symmetrization Lemma, presented in Lemma o 
The hard problem is to check the condition in formula (El in concrete appli¬ 
cations. In our case we would like to apply this lemma to the random variables 
Zp and Zp, p = 1 , 2 ,..., defined in formulas EH) and EH together with the 
cr-algebra B = 1 < j < fc) generated by the random variables 

..., 1 < j < fc. We would like to show that relation holds with 

this choice on a set B of probability almost 1. (Let me emphasize that in 1(1311 
a set of inequalities must hold for all p = 1, 2,... simultaneously if w € B.) 

In the analogous problem considered in Section [7| condition EH had to be 
checked with some appropriate constants a > 0 and /3 > 0 for the random vari¬ 
ables Zp, p = 1 , 2 ,..., defined in formula EH- This could be done fairly simply 
by the calculation of the variance of the random variables Zp, p = 1,2,.... 
A natural adaptation of this approach is to bound from above the supremum 
sup E (^Zp\B) of the conditional second moments of the random variables Zp, 

l<p<oo 

1 < p < oo, defined in EH with respect to the u-algebra B and to show that this 
expression is small with large probability. I have followed this approach in m 
and m One can get the desired estimates, but many unpleasant technical de¬ 
tails have to be tackled in the proof. I do not discuss here all details, I only 
briefly explain what kind of problems we meet when try to apply this method 
in the special case k = 2 and give some indications how they can be overcome. 

In the case k = 2 the definition of Zp is very similar to that of 
defined in EH with the function f = fp. The only difference is that in the 
definition of Zp we have to take the values v = (1,-1), v = (—1,1) and v = 
(—1, —1) in the outer sum, i.e. the term v = (1,1) is dropped, and we multiply 
by (—l)™('“)+i instead of (—1)™*^"^. We can get the desired estimate on the 
conditional supremum of second moments if we can prove a good estimate on 
the conditional second moments of the supremum of the inner sums in In, 2 ifp), 
1 < p < oo, in the case of each index v = (1, —1), v = (—1,1) and v = (-1,-1). 
If we can get a good estimate in the case v = (1,-1), then we can get it in the 
remaining cases, too. So we have to give a good bound on the expression 



/2 _ 

Moreover, since the sequence of random variables Q ' , 1 < Z < n, is 

independent of the u-algebra B, and the canonical property of the functions fp 
implies some orthogonalities, the estimation of the expression in EH can be 
simplified. A detailed calculation shows that it is enough to prove the following 
inequality: 

Let us have a countable class T of canonical functions f{x,y) with respect 
to a probability measure p on the second power (A^, of a measurable space 
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(X, X), which is L 2 -dense with some exponent L and parameter D, (the prob¬ 
ability measure /i is living in the space {X,X)) together with a sequence of 
independent and /^-distributed random variables n > 2, on (X,X), 

and let the relations 

J fix,yfp{dx)p{dy) <a^, sup |/(a;,y)] < 1 for all / e 

hold with some number 0 < < 1 which satisfies the relation na^ > K{L -\- 

/3)logn with /3 = max ^ , 0^ and a sufficiently large fixed constant K > 0. 

Then the inequality 



holds a A > Aq with some sufficiently large fixed constant Aq. 

Inequality is similar to relation 14711 in ProDOsition l7.1l in the case k = 1, 
but it does not follow from it. (It follows from 14711 in the special case when 
the function / does not depend on the argument y with respect to which we 
integrate.) On the other hand, inequality can be proved by working out 
a similar, although somewhat more complicated symmetrization argument and 
induction procedure as it was done in the proof of Proposition rm in the case 
k = 1. After this, inequality enables us to work out the symmetrization 
argument we need to prove Proposition rm for k = 2. This procedure can be 
continued for all k = 2,3,.... If we have already proved Pronosition 17.11 for 
some k, then an inequality can be formulated and proved with the help of the 
already known results which enable us to carry out that symmetrization pro¬ 
cedure which is needed in the proof of Proposition rm in the case k 1. This 
is a rather cumbersome method with a lot of technical details, hence its de¬ 
tailed explanation had to be omitted from an overview paper. In the work m 
Sections 13, 14 and 15 deal only with the proof of Proposition 17. II Section 13 
contains the proof of some preparatory results and the formulation of the induc¬ 
tive statements we have to prove to get the result of Proposition l7.1l Section 14 
contains the proof of the Symmetrization arguments we need, and finally the 
proof is completed with their help in Section 15. 

There is an interesting theory of Talagrand about so-called concentration 
inequalities. This theory has some relation to the questions discussed in this 
paper. In the last section this relation will be discussed together with some 
other results and open problems. 

9. Relation with other results and some open problems 

Talagrand worked out a deep theory about so-called concentration inequalities. 
(See his overview in paper IHS] about this subject.) His results are closely related 
to the supremum estimates described in this paper. First I discuss this relation. 
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9.1. On Talagrand’s concentration inequalities 

Talagrand considered a sequence of independent random variables a 

n 

class of functions iF, took the partial sums X) fi^j) for all functions f G J-, 

and investigated their supremum. He proved such estimates which state that 
this supremum is very close to its expected value, (it is concentrated around it). 
The following theorem in paper m is a typical result in this direction. 

Theorem 9.1 (Theorem of Talagrand). Consider n independent and iden¬ 
tically distributed random variables with values in some measurable 

space {X,X). Let T be some countable family of real-valued measurable func- 

n 

tions of {X, X) such that ||/||oo < & < oo for every f G T. Let Z = sup ^ f{f,i) 

n 

and V = i?(sup ^ f^{£,i))- Then for every positive number x, 
j=i 

P(Z > FZ + x) < Xexp|-^|log (67) 

and 

+ ( 68 ) 

where K, K', ci and C 2 are universal positive constants. Moreover, the same 
inequalities hold when replacing Z by —Z. 

Inequality can be considered as a generalization of Bennett’s inequality, 
inequality PI as a generalization of Bernstein’s inequality. In these estimates 
the distribution of the supremum of possibly infinitely many partial sums of 
independent and identically distributed functions are considered. A remarkable 
feature of Theorem IQ is that it imposes no condition about the structure of 
the class of functions T. In this respect it differs from Theorems El and El 
in this paper, where such a class of functions T is considered which satisfies a 
so-called L 2 -density property. 

Talagrand’s study was also continued by other authors who got interesting 
results. In particular, the works of M. Ledoux m and P. Massart m are worth 
mentioning. In these works the above mentioned result was improved. Such a ver¬ 
sion was proved which also holds for the supremum of appropriate classes of sums 
of independent but not necessarily identically distributed random variables. (On 
the other hand, I do not know of such a generalization in which [/-statistics of 
higher order are considered.) The improvements of these works consist for in- 

_ n 

stance in a version of Theorem El where the quantity v = E{snp Pi^i)) 

i=l 

n 

is replaced by cr^ = sup Var(/(^i)), i.e. the supremum of the expectation 

i=l 

n 

of the individual partial sums ^ considered (the statement that 

2=1 
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equals the supremum of the expected values of the partial sums ^ holds 

i—1 

if Ef{^i) = 0 for all random variables and functions /) instead of the second 
moment of the supremum of these partial sums. 

On the other hand, the estimates in Theorem 19.II contain the expected value 

EZ = E sup f{^i) , and this quantity appears in all concentration type 

J 

inequalities. This fact has deep consequences which deserve a more detailed 
discussion. 

Let us consider Theorem l9.1l or one of its improvements and try to understand 
what kind of solution they provide for problem b) or b') formulated in Section^ 
in the case k = 1. They supply a good estimate on the probabilities we consider 

n 

for the numbers u > n~^/'^EZ = n“^/^if(sup Y /(C*))- But to apply these 


results we need a good estimate on the expectation EZ of the supremum of 
the partial sums we consider, and the proof of such an estimate is a highly 
non-trivial problem. 

Let us consider problem b') (in the case fc = 1) for such a class of functions T 
which satisfies the conditions of Theorem IQ The considerations taken in Sec- 
tionl^show that there are such classes of functions T which satisfy the conditions 

of Theorem Iti.dl and for which the probability P(sup vT^!'^ Y > ctcr log 

i=l 

is almost 1 with an appropriate small number a > 0 for all large enough sam¬ 
ple sizes n. (Here the number a is the same as in Theorem I6.dl l This means 
that En~^/'^Z > (a — £:)cr log - for all e > 0 if the sample size n of the se¬ 
quence is greater than riQ = no(e, cr). Some calculation also shows 

that under the conditions of Theorem Iti.HI En ^I'^Z < Kcr log ^ with an ap¬ 
propriate number K > 0. (In this calculation some difficulty may arise, be¬ 
cause Theorem lO for fc = 1 does not yield a good estimate if u > y/na^. 


- 2 

But we can write P{sup Y fHi) > ^ with 

fGP i=l 

if M > y/na^, and this estimate is sufficient for us. We get the 
upper bound we formulated for n~^/'^EZ from Theorem Ifi . HI only under the con¬ 
dition ntr^ > const, log ^ with some appropriate constant. It can be seen that 
this condition is really needed, it appeared not because of the weakness of our 
method. I omit the details of the calculation.) Then the concentration inequality 
Theorem l9.ll or more precisely its improvement. Theorem 3 in paper |2ti| which 
gives a similar inequality, but with the quantity cr^ instead of v implies Theo¬ 
rem lO in the case k = 1. This means that Theorem IQ can be deduced from 
concentration type inequalities in the case fc = 1 if we can show that under its 
conditions En~^/‘^Z < Ka\og^ with some appropriate K > 0 depending only 
on the exponent and parameter of the L 2 -dense class T. Such an estimate can 
be proved (see the proof in |21 on the basis of paper ESI), but it requires rather 
long and non-trivial considerations. I prefer a direct proof of Theorem ESI 
Finally I discuss a refinement of Theorems 14. II and 14.31 promised in a remark 
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at the end of Section ^together with some open problems. 

9.2. Some refinements of the estimate in Theorems \4-l\ and \4-S\ 

If we have a bound on the L 2 and Loo norm of the kernel function / of a 
{/-statistic Ik,nif)-> but we have no additional information about the behaviour 
of /, (and such a situation is quite common in mathematical statistics problems), 
then the estimate of Theorem id.dl about the distribution of {/-statistics cannot be 
considerably improved. On the other hand, one would like to prove such a multi¬ 
dimensional type version of the large deviation theorem about partial sums of 
independent random variables which gives a good asymptotic formula for the 
probability > a) for large values u. Such an estimate should 

depend on the function /. A similar question can be posed about the distribution 
of multiple Wiener-Ito integrals Zn,k{f) if fc > 2, because the distribution of such 
random integrals (unlike the degenerate case fc = 1) is not determined by their 
variance. 

Such large deviation problems are very hard, and I know of no result in this 
direction. On the other hand, some quantities can be introduced which enable 
us to give a better estimate on the distribution of Wiener-Ito integrals or U- 
statistics in the case of their knowledge. Such results were known for Wiener-Ito 
integrals .^^, 2 (/) and {/-statistics In, 2 {f ) of order 2 earlier, and quite recently 
they were generalized for all fc > 2.1 describe them and show that they are useful 
in the solution of some problems. My formulation will differ a little bit from the 
previous ones. In particular, I shall speak about Wiener-Ito integrals where 
previous authors considered only polynomials of Gaussian random vectors. But 
the Wiener-Ito integral presentation of these results seems to be more natural 
for me. First I formulate the estimate about Wiener-Ito integrals of order 2 
proved in m by Hanson and Wright. 

Theorem 9.2. Let a two-fold Wiener-Ito integral 



be given, where pw is a white noise with a non-atomic reference measure fi, and 
the function f satisfies the inegualities 



(69) 


and 



(70) 


with some number D > 0 for all functions gi and g 2 such that f g^(x)fi(dx) < 1, 
j = 1,2. There exists a universal constant K > 0 such that the inequality 



(71) 


holds for all u > 0. 
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As it was remarked in Section 01 we can assume without violating the gener¬ 
ality that the function / in the dehnition of Wiener-Ito integrals is symmetric. 
In this case Theorem O can be reformulated to a simpler statement. 

To do this let us define with the help of the (symmetric) function / the fol¬ 
lowing so-called Hilbert-Schmidt operator Af in the L 2 (p) space of square inte- 
grable functions with respect to the measure fi: Afv{x) = J f{x, y)v{y)p{ dy) for 
all L 2 {yt) measurable functions u(-). It is known that is a compact, self-adjoint 
operator, hence it has a discrete spectrum. Let Ai, A 2 ,... denote the eigenvalues 
of the operator Af. It follows from the theory of Hilbert-Schmidt operators and 
the Ito formula for multiple Wiener-Ito integrals that the identity 2’^, 2 )/) = 

CXD 

S ~ 1) holds with some appropriately defined independent standard nor- 

1=1 

00 

mal random variables 771 , 772 ,.... Beside this, = f f^i^ty)pid,x)y,{dy). 

j=i 

Hence condition (El can be reformulated as condition m is 

j=i 

equivalent to the statement that sup |Aj| < D. In such a way Theorem lb.2l ca.u 

3 

be reduced to another statement whose proof is simpler. 

Theorem 19.21 yields a useful estimate if <c; cr^. In this case it states that 
for large numbers u the bound P(Z^_ 2 (/) > m) < const.supplied by 
Theorem 01 can be improved to the bound P(^n, 2 (/) > u) < const.e 
The correction term ^ at the right-hand side of is needed to get an estimate 
which holds for all u > 0. It may be worthwhile recalling the following result 
(see EH or El, Theorem 6.6). All fc-fold Wiener-Ito integrals satisfy 

the inequality P{\Z^^k{f)\ > u) > with some K = K{f,fj.) > 0 

and A = A{f,ft) > 0. There is a strictly positive number A = A(f,/j) in the 
exponent of the last relation, but the proof of El yields no explicit lower bound 
for it. 

There is a similar estimate about the distribution of degenerate [/-statistics 
of order 2. This is the content of the following Theorem 19. dl 


Theorem 9.3. Let a sequence ... of independent p distributed random 
variables be given together with a function f{x,y) canonical with respect to the 
measure y, and consider the (degenerate) U-statistic In, 2 {f) defined in ^ with 
the help of the above quantities. Let us assume that the function f satisfies 
conditions and with some ct > 0 and D > 0, and also the relations 


sup / f{x,y)fj,{dy) < Ai, sup / f{x,y)p{dx) < A 2 , sup|/(a;,y)| < B 

X J y J x,y 

(72) 

hold with some appropriate constants Ai > 0, A 2 > 0 and B > 0. Then there 
exists a universal constant K > 0 such that the inequality 

,l/3y2/3 „l/2yl/2' 


\ f 1 {u n 

P{n- |/„^,|>„)<A-exp|--(^,-,— 


-hA2)l/3’ pl/2 


(73) 


is valid for all u > 0. 
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Theorem ESI was proved in [H]. The estimate of Theorem ESI is similar to 
that of Theorem E2 the difference between them is that in formula (ESI some 
additional correction terms had to be inserted to make it valid for all u > 0. 
But the proof of Theorem lH.HI is much harder. It can be shown that the estimate 
m implies that of Theorem ESI in the special case fc = 2 if we disregard the 
appearance of the not explicitly defined universal constant if in it. 

To see this observe that Theorem ESI contains the conditions u < v^l'^cP' 
and B < 1 which imply that ^ ^ and 

^ since cr < 1 in this case. Beside 

this, > ^. The above relations imply that in the case u> a the estimate m 
is weakened if the expression in its exponent is replaced by Theorem l4.dl 

trivially holds if 0 < m < cr. 

Theorem ESI is useful in such problems where a refinement of the estimate 
in Theorem E3 is needed which exploits better the properties of the kernel 
function / of a degenerate 17-statistics of order 2. Such a situation appears in 
paper CSI, where the law of iterated logarithm is investigated for degenerate 
17-statistics of order 2. 

Let us consider an infinite sequence ^ 1 ,^ 2 , • of independent /i distributed 

random variables together with a function / canonical with respect to the mea¬ 
sure p,, and define the degenerate [/-statistic In, 2 {f) with their help for all 
n = 1, 2,.... In paper m the necessary and sufficient condition of the iterated 
logarithm is given for such a sequence. More explicitly, it is proved that 

limsup ^ ^ ^ with probability 1 

„^oo n log log n 

if and only if the following two conditions are satisfied: 

^ Cloglogw with some C < 00 for 

all u > 10. 

b) f f(x,y)g{x)h{y)fj,{dx)p{dy) < C with some appropriate C < 00 for all 
such pairs of functions g and h which satisfy the relations f g^(x)fj,( dx) < 
1, / h?{x)yL{dx) < 1, sup \g{x)\ < 00 , sup |li(a;)| < 00 . 


The above result is proved by means of a clever truncation of the terms in the 
[/-statistics and an application of the estimation of Theorem IQ.dl for these trun¬ 
cated [/-statistics. It has the form one would expect by analogy with the classical 
law of iterated logarithm for sums of independent, identically distributed ran¬ 
dom variables with expectation zero, but it also has an interesting, unexpected 
feature. The classical law of iterated logarithm for sums of iid. random variables 
holds if and only if the terms in the sum have finite variance. (The only if part 
is proved in paper |2] or m-) The above formulated law of iterated logarithm 
for degenerate [/-statistics also holds in the case of finite second moment, i.e. if 
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< oo, but as the authors in m show in an example, there are also 
cases when it holds, although Ep{^i,^ 2 ) = oo. Paper is another example 
where Theorem ESI can be successfully applied to solve certain problems. 


To formulate the generalization of Theorems lh.2l and ESI for general k > 2 
some notations have to be introduced. Given a finite set A let V{A) denote 
the set of all its partitions. If a partition P = {Bi,... ,Bs] G ^^{A) con¬ 
sists of s elements then we say that this partition has order s, and write 
|P| = s. In the special case A = {I,..., A:} the notation V{A) = Vk will be 
used. Given a measurable space (X,X) with a probability measure ^ on it 
together with a finite set B = {bi,... ,bj} let us introduce the following nota¬ 
tions. Take j different copies {Xt^,Xb^) and fib,., 1 < < j, of this measurable 

space and probability measure indexed by the elements of the set B, and de¬ 
fine their product {X^^\ , fi^^'>) =(11 ^br, Y{ ^br,Y{ fib,, j. The points 

\r=l r—1 r—1 / 

{xbi, ... ,Xbj) € will be denoted by x^^^ G X^^'> in the sequel. With the 
help of the above notations I introduce the quantities needed in the formulation 
of the generalization of Theorems El and El 

Let a function f = f{xi, ... ,Xk) he given on the fc-fold product {X^, X^,fi^) 
of a measurable space {X, X) with a probability measure fi. For all partitions 
P = {Bi, ..., Bs} G Vk of the set {1,..., A:} consider the functions gr 
on the space I < r < s, and define with their help the quantity 


a{P) = a{P, f, fi) 


( 74 ) 


= sup 

91,•••,5s 


J /(xi,..., Xk)gi j ■■■gs fi{dxi)... fi{dxk ): 

J gl < I for all 1 < r < s|. 


In the estimation of Wiener-Ito integrals of order k the quantities a{P), P G V, 
play such a role as the numbers D and introduced in formulas and rnii 
in Theorem l9.2l Observe that in the case |P| = I, i.e. if P = {I,..., A:} the iden¬ 
tity a^(P) = J p{xi,..., Xk)fi{ dxi)... fi{ dxk) holds. The following estimate is 
valid for Wiener-Ito integrals of general order (see da). 

Theorem 9.4. Let a k-fold Wiener-Ito integral Ik,,kif), k > 1, be defined with 
the help of a white noise pw with a non-atomie reference measure p and a 
kernel function f of k-variable such that f f^(xi,..., Xk)p( dxi)... p( dxk) < 
oo. There is some universal constant C{k) < oo depending only of the order k 
of the random integral such that the inequality 

P(|Z^,fe(/)| > m) < G(A;)exp^ min min f— | (75) 

1 C{k) i<s<kP(^VkAP\=s\a{P) J j 


holds for all u > 0 with the quantities a{P), P G Vk, defined in formula 
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Also the following converse estimate holds which shows that the above esti¬ 
mate is sharp. (See again paper |15|.i This estimate also yields an improvement 
of the result in m mentioned in this subsection. 

Theorem 9.4'. The random integral considered in Theorem also 

satisfies the inequality 


P{\Z^.Mf)\ > u) > 




—Cik) min min 

l<s</c PG-Pfc, |P|=s 



2/s 


for all u > 0 with some universal constant C{k) > 0 depending only on the 
order k of the integral and the quantities a{P), P G Vk, defined in formula EF- 

To formulate the result about the distribution of degenerate U-statistics for 
all A: > 2 an analog of the expression a{P) defined in (17^ has to be introduced. 
Let us consider a set A C {1 ,... ,k} with |A| = k — r elements, 0 < r < fc, 
and a partition P — {Bi ,..., Bg} C P{A), l<s<k — r,ofA together with a 
function f{xi,... ,Xk) square integrable with respect to the fc-fold product pf 
of a probability measure /i on a measurable space (A^, A^). Let us introduce, 
similarly to the definition the quantities 

^(p,2.({i....,fe}\A) (76) 

= sup I f{xi,...,Xk)gi(x^^^'>Y--gs(x^^^A p{dxi)...p{dxk): 

J 9u <1 for all 1 < u < s 

for all " = {xj^j G {1,... A:,} \ A} G where are func¬ 

tions defined on A^®"^ 1 < u < s, and put 

a(A,P)= sup (77) 

To consider also the case |A| = k when {1,...,A:} \ A = 0 let us make the 
following convention. Let us also speak about the partitions of the empty set 
by saying that its only partition is the empty set itself. Beside this, put |0| = 0, 
and 

a({l,..., A;}, 0) = sup lf(xi, ...,Xk)l. (78) 

With the help of the above notations the estimate about the distribution of 
normalized degenerated U-statistics proven in ^ can be formulated. 

Theorem 9.5. Consider a sequence n > k, of independent p dis¬ 

tributed random variables and a bounded function f{xi,...,Xk) of k, k > 2, 
variables canonical with respect to the measure p. Take the (degenerate) U- 
statistic In,k{f) defined in 0) with the help of these quantities. There is some 
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universal constant C = C{k) < oo depending only on the order k of this U- 
statistic such that the inequality 


P{n-^/^\Ir,,k{f)\> u) 


< C exp < — — max max 

C {(r,s) : 0<r<fc, l<s<fc-r} {{A,P) : AC{1, ■ ■ ■ ,k}, 
K U{(r,s) : r=fc, s=0} | A| =r, A), | P | = s} 


l/(2r+s)' 


KAP)) 


holds for all u > 0 with the above constant C and the quantities a{A^ P) defined 
in and 

It can be seen with the help of some calculation that Theorem ESI implies 
Theorem 14. HI for all orders A: > 2 if we disregard the presence of the unspecified 
universal constant C. (It has to be exploited that under the conditions of The¬ 
orem ^31 P) ^ A if 1^1 = r with r = 0, a{A, P) < 1 for \A\ = r > I, 
(T^ < 1, and > u.) 

The proof of Theorems 1131 and 1131 is based, similarly to the proof of Theo- 
rems 14.11 and 14.HI on a good estimate of the (possibly high) moments of Wiener- 
Ito integrals and degenerate {/-statistics. The proofs of these estimates in P 
and m are based on many deep and hard inequalities of different authors. One 
may ask whether the diagram formula, propagated in this work, which gives an 
explicit formula about these moments cannot be applied in the proof of these 
results. I think that the answer to this question is in the positive, and even I 
have some ideas how to carry out such a program. But at the time of writing 
this work I had not enough time to work out the details. 

A natural open problem is to find the large deviation estimates about the 
tail distribution of multiple Wiener-Ito integrals and {/-statistics mentioned at 
the start of this subsection. Such results may better explain why the quantities 
a{P) and a{A, P) appear in the estimates of Theorems ID.41 and ih. 51 It would be 
interesting to find the true value of the universal constants in these estimates or 
to get at least some partial results in this direction which would help in solving 
the following problem: 


Problem. Consider a k-fold multiple Wiener-Ito integral Show that 

its distribution satisfies the relation 

lim u^/'=logP(|Z^,fc(/)| >u) = K{n,f) > 0 

u —>oo 

with some number K{p, f) > 0, and determine its value. 


There appear some other natural problems relating to the above results. Thus 
for instance, it was assumed in all estimates about {/-statistics discussed in this 
work that their kernel functions are bounded. A closer study of this condition 
deserves some attention. It was explained in this paper that its role was to 
exclude the appearance of some irregular events with relatively large probability 
which would imply that only weak estimates hold in some cases interesting for 
us. One may ask whether this condition cannot be replaced by a weaker and 
more appropriate one in certain problems. 
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Finally, I mention the following problem. 

Problem. Prove an estimate analogous to the result of Theorem m about the 
supremum of appropriate classes of U-statistics. 

To solve the above problem one has to tackle some difficulties. In particular, 
to adapt the method of proof of previous results such a generalization of the 
multivariate version of Hoeffding’s inequality (see m) has to be proved about 
the distribution of homogeneous polynomials of Rademacher functions where 
the bound depends not only on the variance of these random polynomials, but 
also on some quantities analogous to the expression a{P) introduced in (TT^ . 
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